Dimensionality Reduction
Curse of Dimensionality¶
- When dimensionality increases, data becomes sparse, metrics like distance and density become less meaningful
Why Dimensionality Reduction¶
- Avoid Curse of Dimensionality
- Reduce time & memory used by ML algos
- Easier visualisation of data
- Remove irrelevant features
- Feature extraction/selection
Feature Selection¶
Filter out features that are obviously irrelevant to the prediction
Feature Extraction¶
Combine multiple (height, weight -> bmi) into one
PCA¶
Unsupervised technique for extracting variance structure from high dimensional datasets
An orthogonal projection or transformation of data into possibly lower dimensional space so that the variance of the projected data is max
PCA by SVD¶
- Calculate mean of all the features. \(\bar{x}\)
- Shift origin to \(\bar{x}\).
- Find line of best fit through new origin (min perp dist to line or max proj dist from origin)
- PCA by SVD does the latter for computational reasons
- Max sum of squared distances of projected points from origin
- The line found is called PC1
- The slopes of PC1 tells you about the linear combination of two features from the dataset. This may tell us that one feature is more important than another!
- We normalise the vector made by the slope to get a unit vector. This is known as the singular vector or the eigen vector for PC1
- The sum of squared distances for the line of best fit is called the eigen value
- The square root of the eigen value is called the singular value
- PC2 is a perpendicular line to PC1 (PC3 is a perpendicular line to both PC2 and PC1, ...)
- Find the variance for both PC1 and PC2, and then variance_1/(variance_1+variance_2) tells us the "weight" of PC1 among all the PCs
- To find the PCx (transformation) we dot the feature vector with the corresponding eigen vector
- Once we do this, we can take the PCs that have the high % variation and drop the rest