Mini Project Report Sample



Comments



Description

Mini Project Report Design and implementation of a classification system based on soft computing and statistical approachesSubmitted By: Ashish Kumar Agrawal(2001114) Abstract This project, being developed as a part of MHRD research project (Designing an Intelligent Robot for Explosive Detection and Decontamination funded by MHRD, Govt. of India), explores the design and development of classifier based on statistical methods and soft computing based approaches which is capable of identifying the mines and non mines using various clustering, classification and rules establishment algorithms as to compare the algorithm on the basis of complexity and accuracy. Designing such a classifier is a big challenge because data is not linearly separable and since it has overlapping features, it is not possible to design a classifier with 100% accuracy .This project deals with PVC tubes, wood piece and copper cylinders as non mine data in addition to data of various mines. The basic idea of the classification is based on a fact that it is safe if the non-mines data is predicted as mine but it is not the case when we predict mines data as non-mines. So the unsupervised learning based ART algorithm divides the data into several clusters which are merged on the basis of above fact. Genetic algorithm is enhancing the results to establish the results having negation in the antecedent part. In addition to these approaches, fuzzy approaches also give the membership values corresponding to each class to visual the class of data in better way. I Candidate’s Declaration I hereby declare that the work presented in this project titled “Design and implementation of a classification system based on soft computing and statistical approach ” submitted towards completion of mini-project in sixth Semester of B. I have not submitted the matter embodied in this project for the award of any other degree. Allahabad. (Dr G. Nandi. Allahabad. G. 2004 IIIT (Deemed University) Deoghat. Associate Professor. Jhalwa. Nandi) Associate Professor Place: Date: Allahabad May. Allahabad.C. IIIT. II . (Ashish Kumar Agrawal) Place: Allahabad Date: 17-5-2004 ------------------------------------------------------------------------------------------------------------ Certificate This is to certify that the above declaration made by the candidate is correct to the best of my knowledge and belief.Tech (IT) at the Indian Institute of Information Technology (IIIT). It is an authentic record of my original work pursued under the guidance of Dr. C. Last but not least. for extending a helping hand at every juncture of need. This fueled my enthusiasm even further and encouraged us to boldly step into what was a totally dark and unexplored expanse before us.C. in general. whether it was an off-hand comment to encourage us or a constructive piece of criticism and a special thank to JRC database provider who arranged a good database for mines. Ashish Kumar Agrawal (2001114) III .Acknowledgement First and foremost. Nandi. Dr G. I was privileged to experience a sustained enthusiastic and involved interest from his side. I would also like to thank my seniors who were ready with a positive comment all the time. I would like to thank the IIIT-A staff members and the institute. I would like to express my sincere gratitude to my project guide. Table Of Contents Abstract……………………………………………………………….1.….…3 CHAPTER III Approaches in This Direction….4 3.3 Fuzzy C-mean……………………………………………….2..3 2.2 Problem Statement………………………………………………….2 k-nearest neighbour……...1 Genetic algorithm ……………………………………………5 3.…………………….2 Selection of an algorithm………...1 1.7 3...…………………………….II Certificate…………………………………………………………………II Acknowledgements………………………………………………………...3 2.…………………………………....8 3.2.1 Introduction…………………………………………………………1 1..1 Features extraction ….4 3..2 Adaptive resonance theory (ART)…...4 3.III List Of Figures………………………………………………………… CHAPTER I Introduction and Statement Of Problem……………………………….5 Gath–geva algorithm…………………………………….1 CHAPTER II Challenges In This Field ..………………………………………...1 s tatistical Approaches..4. Gustavson kessel algorithm……………………………….……………………………………………..1 Clustering algorithm Kmean .6 Kohonen SOM…………………………………………….….2..I Declaration……………………………………………………………….………………………………………………..2.…..……………………………….9 VII IV ..8 3.2..6 3...4 3.1....2 Softcomputing………………………………………………………5 3. …………………………………….2. Distributed computing environment……………………….12 5..CHAPTER IV System Architecture…….3.………………………………………………10 4.16 References ……………………………………………………………….3.3.2 conclusion …………………………………………………………15 5.………………………………….10 CHAPTER V Results And Conclusions…………………………………………………12 5.16 5. 17 -Books ……………………………………………………………………17 ...…....3 Future Extensions…….2.2 Algorithm and table selection…………………………………….17 V .………16 5....15 5...Dealing with various platform and format……………….1 Results…………………………………………………………….1 Data Source Name and login….1 Improvement in the genetic algorithm………………….………………………………………….Research Papers……………………………………………………….3.10 4.. ………..…11 Fig 2: Main frame of algorithm…………………………………….…………………………12 Fig 5: Result of Fuzzy c-mean and Gustavson kessel algorithm …………………………………………………………………………….……...14 Fig 7: Result of k-nearest neighbour algorithm….13 Fig 6: Result of k-mean algorithm...……………….List of Figures Fig 1: Flow of information………………………………………..…11 Fig 3: Result of genetic algorithm……….………………………12 Fig 4: Result of ART………………………..14 Fig 8: Result of kohonen SOM…..15 VI ... …..…….….……....………………..………………………. VII . ” Here. 1 Int r oducti on “If we already know about the upcoming hazards. to predict the possible class of incoming data. so to design an obstacles free path for robot is another aspect beyond the domain of this module. Robot is designed to move toward these predicted areas to decontaminate the mines. To tackle this problem a classification toolkit has been designed using some statistical and soft computing based approaches to cluster the data. My objective is to predict whether at a particular point of working area is occupied by mines or not. with some confidence parameter. These mines occupied area can be known before initiation of robot movements or can be predicted dynamically. On the basis of this prediction path designers develop the obstacle free path to decontaminate these mines. it is very easy to find the way to abolish it. 1 . PVC tubes and actual mine data. 2 S t a t e m e n t O F P r ob l e m 1 .Chapter 1 Introduction & Statement of Problem BRIEF OVERVIEW 1. The data may be given in image form or some tabular form having all numeric or categorized attributes. this sentence is being described in the context of Landmine Detection and Decontamination. It is impossible to design a classifier having 100% right classification because it is not easy to differentiate between the data of metallic debris. to generate some rules in the term of confidence parameter. This data is generated by some highly accurate sensors. so we need a classification system that can differentiate a mine from metallic debris on the basis of given data.Anti-Personal landmines are a significant barrier to economic and social development in a number of countries. 2 . 1 Features Extraction The initial problem is the problem of features extraction.2 sel e cti on of an Al gorithm The second problem is to choose an algorithm that can interpret the problem in best way. 3 . 2. Here blobsize. so the best way is to design various algorithms and then check their efficiency and accuracy. The algorithms can be categorized in two parts: (1)Statistical approaches (2)Softcomputing based approaches The three types of algorithms can be applied here: Classification. so it is very difficult to extract the exact boundary of object. blobaspectratio and blobintensity have been chosen. metallic debris and Mines. There may be various features those can be used as the raw material of system. Clustering and Rules establishment with some certainty factor. The data in some tabular format having numerical or categorized values of attributes can also be given. which is more suitable for the algorithms 2. the basic problems are the features extraction (building blocks of algorithms) and selection of good algorithms those can generate results with high certainty value. Generally the image data is given having a blurred image of an object. The given data may contain the images of PVC tube.Chapter 2 Challenges in This Field In the field of classification and rules establishment. If the number of data is less than the number of cluster then we assign each data as the centroid of the cluster.1. Then we assign all the data to this new centroid. Mathematically this loop can be proved to be convergent Since there are only two classes mine and non-mine so number of classes is given 2 as input with the dataset [B1]. we calculate the distance to all centroid and get the minimum distance.1 Clustering Algorithm: Kmean Clustering is a nonlinear activity that generates ideas. images and feelings around a stimulus word.n e a r e s t N e i g h b ou r 4 . 3. This process is repeated until no data is moving to another cluster anymore. If the number of data is bigger than the number of cluster. 2 K . 3.1 Stati stical A pproa ch es Two algorithms have been used one for clustering (K-mean algorithm) and another for prediction of class of incoming data (K-nearest neighbour). 3 . This data is said belong to the cluster that has minimum distance from this data. Since we are not sure about the location of the centroid. Each centroid will have a cluster number. 1 . for each data.Chapter 3 APPROACHES IN THIS DIRECTION In this section the various algorithms will be discussed being used to achieve the objective. we need to adjust the centroid location based on the current updated data. Clustering may be a class or an individual activity. the rules which have negation in the attributes cannot be discovered.) G en e ti c a l gor i thm 3 .2 Soft computi n g app r oa ch es S o f t c o m pu t i n g a p p r o a c h e s c a n b e c l a s s i f i e d i n to severa l ca tegor ies like: 1. 2 . Genetic Algorithms (GAs) has been used.) Kohonen SOM 5. 3. [B2] The problem of choosing k still remains. but a general rule of thumb is to use K= sqrt(N). i. A disadvantage of this method is that it is computationally intensive for large data sets.) Ad ap tive r eso nance th eor y 4. Density estimator: qc(x) = (number of neighbors of class c)/K The neighbors are the k closest point to the given sample .) Fuzz y cluster ing 3.) Neural approaches 2.e. 1 G e n e t i c a l g o r i t h m t o e s t a b l i s h r u l es T o e sta blish the rules between t h e a t t r i b u t e s o f d a t a a sso c ia t i on ru le but a ssociation Rule mining cannot predict the complete set of rules. Finally the class is predicted having the highest estimator. To overcome that disadvantage.Their mutual distances are calculated by city block distance. F i r st o f a l l a ss o c i a t i o n r u l e i s a p p l i e d w i t h s o m e s u p p o r t a n d c o n f id e n c e v a l u e s e n t e r e d b y u s e r t o ge n e r a t e s o m e b a s e r u l e s 5 . Where N is the number of learning samples.K-nearest neighbour technique is used to predict the class of incoming data on the basis of given training data and density estimator (k-nn) to estimate the confidence of the incoming sample for a particular class. But once a back propagation is trained.and these r u les a re sent to ge n e t i c a l g o r it h m a s i n p u t w h i c h h e l p s t o e v o l v e s o m e n e w r u l e h a v in g n e ga t io n i n a t t r ib ute s .if this value is higher than threshold values then the weight of this connection is updated otherwise a new output neuron is added. (c)M u ta t i on: m ut a t i o n p o i n t i s g e n e r a t e d r a n d o m l y a n d t h e b i t va lue a t this po i n t i s t o g g l e d . [R2] So ART is a new neural network technique to solve this problem. A ft er so m e i tera t io n w e f i nd s om e r u les fo l l ow in g t h e a bo ve p r o p e r t i e s a n d h a v in g h i gh f i t n e ss va lue that ca n be ca lculated e i t h e r u s i n g t h e c o n fid en c e va lu e o r b y c o n fu s i o n m a t r i x . T h e t h r e e b a s i c p a r t o f g e n et i c a l g or i thm a r e a s f ol l ow : (a)S el ec t i on: R ou l e t t e w h e e l t e c h n i q u e i s u s e d t o s e l e c t t h e t w o parents [R1]. 2 A d a p t i v e r e s on a n c e t h e o r y (AR T ) As w e k no w backpropagation network is very powerful in the sense that it can simulate any continuous function given a certain number of hidden neurons and a certain forms of activation functions. The another fact is that if a non-mine data is predicted as mine it is acceptable but vice-versa is not true because it may be 6 . the number of hidden neurons and the weights are fixed. so there is no plasticity. The network cannot learn from new patterns unless the network is re-trained from scratch. (b)C r os s ov e r : A r a n d o m p o i n t ( c r os s o v e r po int ) is gen erat ed a nd t h e s e gm e n t t o t h e le ft o f t h i s p o i n t o f f i r s t p a r e n t a n d t h a t o f second parent are interchanged. 3 . 2 . Our ultimate objective is to cluster the data in several chunks. Each time one by one samples from the data as input neurons is sent as input and the activation value is calculated corresponding to each of the existing output neurons. and the highest value is chosen . After certain iteration it’s found that the proper clusters of the data in our application don’t have classes more than two (mine and non-mine). m in e .2. so among all the clusters. Here activation function is calculated as the city block distance of the incoming normalized data and weights of connection. b u t w h i l e cla ss i fy i ng t he m in e d a ta it is n o t v e r y e a s y t o d i f f e r e n t i a t e b e t w e e n m i n e a n d n o n . the cluster having the cluster-center farthest from the mine data center is classified as non-mine. rest of the clusters are classified as mine.3 Fuzzy c-mean: I n t h e c la s s ic a l c l u s t e r in g a l gor it h m we h a v e t h e c r is p m e m b e r s h i p o f a c l a s s ( e i t h e r o n e o r z er o ) . 3. I f th is m e m ber sh i p is a vera g e the n w e d ea l th is d a ta a s spe c ia l d a ta a n d c la s s if y t h i s i n t h e c l a s s o f m i n e ( a s m in e a r e d a n g e r o u s ! ! ) . S o w e n e e d a m et h o d t h a t c a n t e l l t h e m em be r s h i p o f t h e d a t a i n e ac h cla ss . [ R 3] where |X| is the feature vector and p is the number of classes (p=2 in our case) Membership values Euclidean distance: Mean center prototype: 7 .dangerous. 5 Gath-Geva Algorithm : This algorithm assumes that data is normally distributed.Mean center prototype(Ci)= I f t h e d i f f e r e n c e o f t h e m e m be r s h i p v a l u e w i t h p r e v i o u s memb ers hip va lu e is le ss tha n t h r e s h o l d t h a n a l g o r i t h m ter m ina t e w it h h a v in g th e m em be r s h ip v a l u e f o r e a c h c la s s . 4Gustavson-Kessel Algorithm It is an improvement of fuzzy c-mean clustering algorithm . 2 . 3 . [R3] Distance : 8 . 2 .the correlation between the data is not considered in c mean. In this algorithm we redefine our distance formula as: [R3] Mahalobis distance : where Ai is the mean center prototype and xj and cj are the sample attribute and cluster center. And covariance matrix is calculated as : Fuzzy covariance matrix Mean center prototype 3 . Now we have some brief knowledge of algorithms those have been implemented . Self-organizing maps also learn the topology of their input vectors.[B3] A self-organizing map learns to categorize input vectors. Neurons next to each other in the network learn to respond to similar vectors.where and is the a-priori probability of data belonging to cluster i. Competitive networks also learn the distribution of inputs by dedicating more neurons to classifying parts of the input space with higher densities of input. Feature maps allocate more neurons to recognize parts of the input space where many input vectors occur and allocate fewer neurons to parts of the input space where few input vectors occur. 6 Kohonen SOM: A competitive network learns to categorize the input vectors presented to it. where only one neuron has an output at a time. It also learns the distribution of input vectors. Self-organizing maps allow neurons that are neighbors to the winning neuron to output values. then a competitive network will do. The layer of neurons can be imagined to be a rubber net that is stretched over the regions in the input space where input vectors occur. Before applying this algorithm it is suggested to analyze data whether it is normally distributed or not. 3. 9 . 2 . Thus the transition of output vectors is much smoother than that obtained with competitive layers.Now I will discuss the architectural design of classification system followed by the results . Mean center prototype The symbols have same explanation as above. If a neural network just needs to learn to categorize its input vectors. > d at a s o u r c e s ( o d b c ) .> c o n f i g u r e ) .1 Data source name and login: Data is being maintained in MS Access.> a d m i n i s t r a t i v e t o o l .2 Algorithm and table selection A n y t a b le e x ist i n g in t h e i n p u t d a t a b a s e c a n b e s e l e c t e d w i t h t h e algor it h m(Se le ct ca te gor i zed tab le if u ap p ly ge net i c 10 .Chapter 4 System Architecture As I have already discussed that input can have image form or tabular form . Medium and High with class value simply mine or non-mine. a n e w p a g e o p e n s h a v i n g all the algorithms and table selection facility.this DSN name is asked when u initialize the application with the user name and password that can be o b t a in e d f r o m h e l p ( B e c a u s e t h i s i s d e s i g n e d f o r d e m o n s t r a t i o n s o u s e r n a m e a n d p a s s w o r d h a v e b e e n g i v e n in h e lp ). A f t e r t h e c o n f i g u r a t i o n h e w i l l b e a s s ig n e d a DSN name . 4. 4.Matlab has been used to extract the features from the input images. We have the numerical attributes based table with the entry whether the data belongs to mine or non-mine. User is free to enter any d a t a b u t h e n e e d s t o c o n f i g u r e t he d a t a b a s e f ir s t u s i n g ( c o n t r o l p a n e l . Whe n c o n n e c t b u t t o n i s p r e s s e d i f th e u s er na m e a nd p a ss w or d a r e c o r r e c t a n d t h e e n t e r e d D S N e x ist s .> s y s t e m D S N . but for the genetic algorithm categorized table is required so data is categorized in three categories :Low. The interface is self explanatory with proper help. Different algorithm can ask for some input parameter like clustering algorithm can ask for number of cluster etc. Fig 1: Flow of information Fig 2: Main frame of algorithms 11 . Java language has been used at front hand and Microsoft Access XP for Database in back hand and JDBC Bridge to communicate between algorithms and databases.algorithm).Now the algorithm specific results will be displaced. The ultimate objective of this module is to compare between various algorithms and differentiate between them on the basis of their accuracy and results. 1 R es u l t s : This module has successfully been implemented. Genetic algorithm: Fig 3 : Result of genetic algorithm This snapshot is displaying the result of both association rule and genetic rules. ART Fig 4 : Result of ART algorithm 12 . It is very much clear that genetic algorithm has generated the rule having negative attribute value in antecedent part so this algorithm is very useful to establish rules.Chapter 5 Result And Conclusion 5 . So it is very easy to check some data that can not be classified as mine and non-mine. so the all the cluster having more distance from the non-mine center has been assigned mine class. so this type of data can be put into mine class to avoid danger. 13 This algorithm is abolishing the drawback of fuzzy c-mean algorithm because it considers the correlation between data.the ART gives the multiple class distribution of given data. Fuzzy C-mean Gustavson kessel The fuzzy c-mean algorithm is giving rules with membership value in each class. Because to predict a non-mine as a mine is not as much dangerous as to predict a mine as non-mine.the ART algorithm is also giving good rules . . On the basis of class of nearest neighbours. First of all. the training data must be given with the inputs and the number of nearest neighbours. The algorithm gives good result when number of K is more which also makes 14 . Fig 6 : Result of Kmean algorithm K nearest neighbour algorithm This algorithm is useful if someone wants to know the class of a given data.Fig 5 : Result of Fuzzy c-mean and Gustavson kessel algorithm Kmean Algorithm Kmean algorithm is non-adaptive and time consuming and giving the accuracy of 65%. this algorithm predicts the possible class of the input data. 2 C o n clusi on A l l t h e e i ght d if f e r e n t a lg or it h m s h a v e b e e n i m p l e m e n t e d t o c o m pa r e t h e r e s u l t s . Fig 7 : Result of Knearest neighbour algorithm Kohonen SOM This algorithm is also used for clustering and it’s quite a fast algorithm based on ‘winner take all’ strategy. j r c .d a t a b a se . c o p p e r c y l in d e r ( N o n m in e d a t a) a n d t h e m i n e d a t a o b t a in e d f r o m j r c I s r a e l ( h t t p : / / a p l .m ea n a nd G us t a vs on k e sse l is a l so g ood b e c a u s e o f m e m b e r s h i p v a lu e s f or ea ch cla ss. 15 . It differentiates the mine and non-mine up to 80% accuracy Fig 8 : Result of Kohonen SOM algorithm 5. F u z z y C .this algorithm very time consuming. i t ) . Th is modu le c a n d if f e r e n t ia t e b e t w e e n t h e P V C t ub e . b r a s s t ub e . T h e b e s t re su lt is b e in g given b y AR T an d Ge ne tic a l g o r i t h m . T h i s c l a s s i f ie r is givin g re su lt with 80% a c c u r a c y . wo o d p ie c e . MySql etc). so the other type of mutation can also be practiced like deletion .3. 16 . 5.3.3.In practical and real life application we have several GB of data .insertion and segment mutation etc. 5.5. 3 F u t u r e E x t e n s i on We contemplate following future features which can be incorporated into this project:5.3 Dealing with various platforms and formats: The data may be various format and databases system so system should be flexible enough to handle the various formats and DBMSs like (Oracle . To operate this much of data we need the distributed databases and computing.2 Distributed computing environment: Generally we have to deal with large databases because on the basis of 100 tuples databases it is very hard to predict the exact class of data . and the crossover and mutation probabilities can be modified to get better results.1 Im prov em ent in th e gen eti c a l gorit h m :the implemented genetic algorithm in this module incorporates only point mutation. 17 . Vasconcelos.1 B. R. Sucharita Gopal. A. Stork (2001) Pattern classification (2nd edition). ART Neural Networks for Remote Sensing: Vegetation second R. IEEE Press. 37. Bezdek. ISBN 0471056693 Valluru B. Ramírez. Richard O. SEPTEMBER 2001. 5.References Books B. IEEE TRANSACTIONS ON MAGNETICS. David G.. NO.K. A. and Curtis E.3. Marin N. Duda. R. Gjaja. New York. 1992: Fuzzy Models for Pattern Recognition. New York. J.2 B.3 Earl Gose Steve Jost Richard Johnsonbaugh Pattern Recognition and Image Analysis June.C. 1996 0132364158 Prentice Hall. R. Rao C++ Neural Networks and Fuzzy Logic edition.1 Improvements in Genetic AlgorithmsJ. H. Peter E.2 Classification from Landsat TM and Terrain Data Gail A.. Hart. Taka hashi. Pal. Wiley. and R. J. VOL. Saldanha . S. Carpenter. C. Woodcock. Research Papers R.
Copyright © 2024 DOKUMEN.SITE Inc.