Concept of Image Classification2Image classification - assigning pixels in the imageto categories or classes of interestExamples: built-up areas, waterbody, greenvegetation, bare soil, rocky areas, cloud, shadow, GNR401 Dr. A. Bhattacharya

Concept of Image Classification3Image classification is a process of mapping numbers tosymbolsf(x): x D; x Rn, D {c1, c2, , cL}Number of bands n;Number of classes Lf(.) is a function assigning a pixel vector x toa single class in the set of classes DGNR401 Dr. A. Bhattacharya

Concept of Image Classification4 In order to classify a set of data into different classesor categories, the relationship between the data andthe classes into which they are classified must be wellunderstood To achieve this by computer, the computer must betrained Training is key to the success of classificationClassification techniques were originally developedout of research in Pattern Recognition field GNR401 Dr. A. Bhattacharya

Concept of Image Classification5Computer classification of remotely sensed imagesinvolves the process of the computer programlearning the relationship between the data and theinformation classesImportant aspects of accurate classification Learning techniques Feature setsGNR401 Dr. A. Bhattacharya

Types of Learning6 Supervised Learning Learning process designed to form a mapping from one setof variables (data) to another set of variables (informationclasses) A teacher is involved in the learning processUnsupervised learning Learning happens without a teacher Exploration of the data space to discover the scientifc lawsunderlying the data distributionGNR401 Dr. A. Bhattacharya

Features7 Features are attributes of the data elements basedon which the elements are assigned to variousclasses. E.g., in satellite remote sensing, the features aremeasurements made by sensors in differentwavelengths of the electromagnetic spectrum –visible/ infrared / microwave/texture features GNR401 Dr. A. Bhattacharya

Features8 In medical diagnosis, the features may be thetemperature, blood pressure, lipid profile, bloodsugar, and a variety of other data collected throughpathological investigations The features may be qualitative (high, moderate, low)or quantitative. The classification may be presence of heart disease(positive) or absence of heart disease (negative)GNR401 Dr. A. Bhattacharya

Supervised Classification9 The classifier has the advantage of an analyst ordomain knowledge using which the classifier can beguided to learn the relationship between the dataand the classes. The number of classes, prototype pixels for eachclass can be identified using this prior knowledgeGNR401 Dr. A. Bhattacharya

Partially Supervised Classification10When prior knowledge is available Forsome classes, and not for others, For some dates and not for others in a multitemporaldataset,Combination of supervised and unsupervised methodscan be employed for partially supervisedclassification of imagesGNR401 Dr. A. Bhattacharya

Unsupervised Classification11 When access to domain knowledge or theexperience of an analyst is missing, the data canstill be analyzed by numerical exploration, wherebythe data are grouped into subsets or clusters basedon statistical similarityGNR401 Dr. A. Bhattacharya

Supervised vs. UnsupervisedClassifiers12Supervised classification generally performs betterthan unsupervised classification IF good qualitytraining data is availableUnsupervised classifiers are used to carry outpreliminary analysis of data prior to supervisedclassificationGNR401 Dr. A. Bhattacharya

Role of Image Classifier13The image classifier performs the role of a discriminant– discriminates one class against othersDiscriminant value highest for one class, lower for otherclasses (multiclass)Discriminant value positive for one class, negative foranother class (two class)GNR401 Dr. A. Bhattacharya

Discriminant Function14g(ck, x) is discriminant function, relating feature vectorx and class ck, k 1, ,LDenote g(ck,x) as gk(x) for simplicityMulticlass Casegk(x) gl(x), l 1, ,L, l kx ckTwo Class Caseg(x) 0 x c1; g(x) 0 x c2GNR401 Dr. A. Bhattacharya

Example of Image Classification15Multiple Class CaseRecognition of characters or digits from bitmaps ofscanned textTwo Class CaseDistinguishing between text and graphics in scanneddocumentGNR401 Dr. A. Bhattacharya

Prototype / Training Data16 Using domain knowledge (maps of the study area,experienced interpreter), small sets of sample pixelsare selected for each class. The size and spatial distribution of the samples areimportant for proper representation of the totalpixel population in terms of the samplesGNR401 Dr. A. Bhattacharya

Statistical Characterization of Classes17Each class has a conditional probability densityfunction (pdf) denoted by p(x ck)The distribution of feature vectors in each class ck isindicated by p(x ck)We estimate P(ck x), the conditional probability ofclass ck given that the pixel’s feature vector is xGNR401 Dr. A. Bhattacharya

Supervised Classification Algorithms18 There are many techniques for assigning pixels toinformational classes, e.g.: MinimumDistance from Mean (MDM) Parallelpiped Maximum Likelihood (ML) Support Vector Machines (SVM) Artificial Neural Networks (ANN) GNR401 Dr. A. Bhattacharya

Supervised Classification Principles19 The classifier learns the characteristics of differentthematic classes – forest, marshy vegetation,agricultural land, turbid water, clear water, open soils,manmade objects, desert etc. This happens by means of analyzing the statistics ofsmall sets of pixels in each class that are reliablyselected by a human analyst through experience orwith the help of a map of the areaGNR401 Dr. A. Bhattacharya

Supervised Classification Principles20 Typical characteristics of classesMean vector Covariance matrix Minimum and maximum gray levels within each band Conditional probability density function p(Ci x) where Ci isthe ith class and x is the feature vector Number of classes L into which the image is to beclassified should be specified by the userGNR401 Dr. A. Bhattacharya

Prototype Pixels for Different Classes21 The prototype pixels are samples of the populationof pixels belonging to each class The size and distribution of samples are formallygoverned by the mathematical theory of sampling There are several criteria for choosing the samplesbelonging to different classesGNR401 Dr. A. Bhattacharya

22GNR401 Dr. A. Bhattacharya

Parallelepiped Classifier - Example ofa Supervised Classifier23 Assign ranges of values for each class in each band Reallya “feature space” classifier Trainingdata provide bounds for each feature for eachclass Resultsin bounding boxes for each class Apixel is assigned to a class only if its feature vectorfalls within the corresponding boxGNR401 Dr. A. Bhattacharya

Parallelepiped Classifier24GNR401 Dr. A. Bhattacharya

Parallelepiped Classifier25GNR401 Dr. A. Bhattacharya

Advantages/Disadvantages ofParallelpiped Classifier26 Does NOT assign every pixel to a class. Only thepixels that fall within ranges. Fastest method computationally Good for helping decide if you need additionalclasses (if there are many unclassified pixels)Problems when class ranges overlap—must developrules to deal with overlap areas. GNR401 Dr. A. Bhattacharya

Minimum Distance Classifier27 Simplest kind of supervised classificationThe method: Calculatethe mean vector for each class Calculate the statistical (Euclidean) distance from eachpixel to class mean vector Assign each pixel to the class it is closest toGNR401 Dr. A. Bhattacharya

Minimum Distance Classifier28GNR401 Dr. A. Bhattacharya

Minimum Distance Classifier29 AlgorithmEstimate class mean vector and covariance matrix from trainingsamples mi Sj Ci Xj ; Ci E{(X - mi ) (X - mi )T } X Ci} Compute distance between X and mi X Ci if d(X, mi) d(X,mj) j Compute P(Ck X) Leave X unclassified ifmaxk P(Ck X) TminGNR401 Dr. A. Bhattacharya

Minimum Distance Classifier30 Normally classifies every pixel no matter how far it isfrom a class mean (still picks closest class) unless theTmin condition is applied Distance between X and mi can be computed indifferent ways – Euclidean, Mahalanobis, city block, GNR401 Dr. A. Bhattacharya

Maximum Likelihood Classifier31 Calculates the likelihood of a pixel being indifferent classes conditional on the availablefeatures, and assigns the pixel to the class with thehighest likelihoodGNR401 Dr. A. Bhattacharya

Likelihood Calculation32 The likelihood of a feature vector x to be in class Ci istaken as the conditional probability P(Ci x). We need to compute P(Ci x), that is the conditionalprobability of class Ci given the pixel vector x. It is not possible to directly estimate the conditionalprobability of a class given the feature vector.Instead, it is computed indirectly in terms of theconditional probability of feature vector x given thatit belongs to class Ci.GNR401 Dr. A. Bhattacharya

Likelihood Calculation33P(Ci x) is computed using Bayes’ Theorem in terms ofP(x Ci)P(Ci x) P(x Ci) P(Ci) / P(x)x is assigned to class Cj such thatP(Cj x) Maxi P(Ci x), i 1 K, the number of classes.P(Ci) is the prior probability of occurrence of class i inthe imageP(x) is the multivariate probability density function offeature x.GNR401 Dr. A. Bhattacharya

Likelihood Calculation34 P(x) can be ignored in the computation of Max{P(Ci x)} If P(x Cj) is not assumed to have a known distribution, then itsestimation is said to be non-parametric estimation. If P(x Cj) is assumed to have a known distribution, then itsestimation is said to be parametric. The training data x with the class already given, can be usedto estimate the conditional density function P(x Ci)GNR401 Dr. A. Bhattacharya

Likelihood Calculation35 P(x Ci) is assumed to be multivariate Gaussiandistributed in practical parametric classifiers.Gaussian distribution is mathematically simple tohandle.Each class conditional density function P(x Ci) isrepresented by its mean vector mi and covariancematrix Sip (x Ci ) 1(2 )L/21/ 2 Si eGNR401 Dr. A. Bhattacharya ( x μi )T Si 1 ( x μi )

Assumption of Gaussian Distribution36 Each class is assumed to be multivariate normally distributed That implies each class has a mean mi that has the highestlikelihood of occurrence The likelihood function decreases exponentially as the featurevector x deviates from the mean vector mi The rate of decrease is governed by the class variance;Smaller the variance, steep will be the decrease, and largerthe variance, slower will be the decrease.GNR401 Dr. A. Bhattacharya

Likelihood Calculation37 Taking logarithmof the Gaussian distribution,1L1we get We assume that the covariance matrices for each class aredifferent.t 1(x )The termi Si ( x i )is known as the Mahalanobis distance between x and mi (afterProf. P.C. Mahalanobis, famous Indian statistician and founderof Indian Statistical Institute)gi ( x) ( x i )t Si 1 ( x i ) ln 2 ln Si ln P( i )222 GNR401 Dr. A. Bhattacharya

Interpretation of Mahalanobisdistance38 The Mahalanobis distance between two multivariate quantitiesx and y ist 1d M ( x, y ) ( x y ) S ( x y ) If the covariance matrix is k.I, (I is the unit matrix) then theMahalanobis distance reduces to a scaled version of theEuclidean distance.Mahalanobis distance reduces the Euclidean distance accordingto the extent of variation within the data, given by thecovariance matrix SGNR401 Dr. A. Bhattacharya

Advantages/Disadvantages ofMaximum Likelihood Classifier39 Normally classifies every pixel no matter how far it is from aclass meanSlowest method – more computationally intensiveNormally distributed data assumption is not always true, inwhich case the results are not likely to be very accurateThresholding condition can be introduced into the classificationrule to separately handle ambiguous feature vectorsGNR401 Dr. A. Bhattacharya

Nearest-Neighbor Classifier40Non-parametric in nature The algorithm is: Findthe distance of given feature vector x from ALL thetraining samples x is assigned to the class of the nearest trainingsample (in the feature space) This method does not depend on the class statisticslike mean and covariance.GNR401 Dr. A. Bhattacharya

41GNR401 Dr. A. Bhattacharya

K-NN Classifier42 K-nearest neighbour classifierSimple in concept, time consuming to implementFor a pixel to be classified, find the K closest trainingsamples (in terms of feature vector similarity orsmallest feature vector distance)Among the K samples, find the most frequentlyoccurring class CmAssign the pixel to class CmGNR401 Dr. A. Bhattacharya

K-NN Classifier43 Let ki be number of samples for class Ci (out of Kclosest samples), i 1,2, ,L (number of classes)Note that ki Ki The discriminant for K-NN classifier isgi(x) kiThe classifier rule isAssign x to class Cm if gm(x) gi(x), for all i, i mGNR401 Dr. A. Bhattacharya

K-NN Classifier44 It is possible to find more than one class whose training samplesare closest to the feature vector of pixel x. Therefore thediscriminant function is refined further askij1/d(x,x i )gi ( x) j 1L ki 1/ d ( x, xlj)l 1 j 1The distances of the nearest neighbours to the feature vector of the pixelto be classified are taken into accountGNR401 Dr. A. Bhattacharya

K-NN Classifier45 If the classes are in different proportions in the image, then theprior probabilities can be taken into account:gi ( x) ki p ( i )Lk p ( ) For each pixel to be classified, the feature space distances tol ll 1all training pixels are to be computed before the decision ismade, due to which this procedure is extremely computationintensive, and is not used when the dimensionality (number