What is linear discriminant analysis in machine learning?
In machine learning, discriminant analysis is a technique that is used for dimensionality reduction, classification, and data visualization. It is employed to reduce the number of dimensions (or variables) in a dataset while retaining as much information as is possible.
Linear discriminant analysis (LDA) is also known as normal discriminant analysis (NDA), or discriminant function analysis. It is a generalization of Fisher's linear discriminant, which is used in statistics and other fields to identify a linear combination of features that characterizes or separates two or more classes of objects or events. The linear combination found can be used as a linear classifier as well as dimensionality reduction before classification.
LDA is most commonly used for supervised classification problems. It is related rather closely to analysis of variance (ANOVA) as well as regression analysis. Both of these also seek to express a dependent variable as a linear combination of other features or measurements.
The major difference is that while LDA has continuous independent variables and a categorical dependent variable (the class label), ANOVA makes use of categorical independent variables and a continuous dependent variable.
Since logistic regression and probit regression also explain a categorical variable with the values of continuous independent variables, they happen to be more similar to linear discriminant analysis than ANOVA.
What are the assumptions of linear discriminant analysis (LDA)?
LDA is very sensitive to outliers and the size of the smallest group needs to be bigger than the number of predictor variables. The assumptions of linear discriminant analysis are exactly the same as the assumptions of Multivariate analysis of variance (MANOVA) . They are are explained here:
Multivariate normality
Independent variables necessarily have to be normal for every level of the grouping variable.
Homogeneity of variance or covariance (homoscedasticity)
Variances among group variables need to be the same across levels of predictors. You can test for homoscedasticity using Box's M statistic.
It is preferable to use linear discriminant analysis when covariances are equal and to use quadratic discriminant analysis when covariances are not equal.
Multicollinearity
Predictive power can fall if the correlation between predictor variables rises.
Independence
Participants are randomly sampled. LDA assumes the score of a participant on one variable to be independent of all the other participants’ scores on that variable.
Linear discriminant analysis works fairly well even if these assumptions are slightly violated and it could still be reliable when dichotomous variables are being used ( even though the assumption of multivariate normality tends to be violated in this situation).
How does linear discriminant analysis work?
Linear discriminant analysis helps with classification, dimension reduction, and data visualization. Even though it is rather simple, it generates robust and interpretable classification results. In fact, it is usually the first method that is used while working on real-world classification problems before increasingly complex and flexible methods are employed.
Here are some noteworthy applications of LDA:
Bankruptcy prediction
LDA is used to predict the probability of a firm going bankrupt, based on accounting ratios and other financial variables. Edward Altman’s 1968 model is still used widely in practical applications, with an accuracy of 80% to 90%, in spite of limitations like the nonconformance of accounting ratios to the normal distribution assumptions of Linear discriminant analysis.
Facial recognition
When a computer performs facial recognition, each face is represented by a large number of pixel values. The main purpose of linear discriminant analysis over here is to reduce the number of features to a number that is more manageable. Every one of the new dimensions is a linear combination of pixel values, which form a template.
The features learned by using Principal Component Analysis (PCA) are known as Eigenfaces and the features that are learning from linear discriminant analysis are known as Fisherfaces, named after Sir Ronald Fisher.
Marketing
LDA used to be employed widely in marketing to identify the factors that differentiate between various types of consumers and/or products, base on surveys or other forms of collected data. Now, other techniques like logistic regression are used more often. Here are the steps involved in using LDA for marketing:
- You start by formulating the problem and collecting the data. You determine the primary attributes that your customers use to evaluate that type of product. You could use a marketing research team for data collection. They’ll ask respondents to rate the product from 1 to 5, 7, or 10 on the attributes that you choose. Your team asks the same questions about all the products in that study and the data for all these products is codified and fed into statistical programs like R, SPSS or SAS.
- After that, you have to estimate the Discriminant Function Coefficients and figure out the statistical significance and validity. You need to choose the discriminant analysis method that would work best for you. In the direct method, you have to estimate the discriminant function so that all the predictors can be assessed at the same time. In the stepwise method, all the predictors are analyzed sequentially. If the dependent variable has two categories or states, then you should use the two-group method. If the dependent variable has three or more categorical states, then the multiple discriminant method should be employed. Make use of Wilk’s Lambda to check for significance in SPSS or F stat in SAS. The most frequently used method involves splitting the sample into an estimation or analysis sample, and a validation or holdout sample. You then use the estimation sample to construct the discriminant function and make use of the validation sample to construct a classification matrix containing the number of correctly classified and incorrectly classified cases. The percentage of accurately classified cases is known as the hit ratio.
- Next, you have to plot the results on a two dimensional map, define the dimensions, and get to interpreting the results. You’ll use the statistical program or a related program to map the results. Every product will be plotted (generally in two-dimensional space). The distance between products shows you how different they are from each other. The researcher needs to label the dimensions which is a rather challenging task.
Earth science
Linear discriminant analysis is used to separate the alteration zones. If data from various zones is available, LDA can identify the pattern within the data and classify it in an effective manner. It is also used in biomedical studies, product management, and positioning.