Dimensionality Reduction

Curse of Dimensionality¶

When dimensionality increases, data becomes sparse, metrics like distance and density become less meaningful

Filter out features that are obviously irrelevant to the prediction

Combine multiple (height, weight -> bmi) into one

Unsupervised technique for extracting variance structure from high dimensional datasets

An orthogonal projection or transformation of data into possibly lower dimensional space so that the variance of the projected data is max

Calculate mean of all the features. \(\bar{x}\)
Shift origin to \(\bar{x}\).
Find line of best fit through new origin (min perp dist to line or max proj dist from origin)
PCA by SVD does the latter for computational reasons
Max sum of squared distances of projected points from origin
The line found is called PC1
The slopes of PC1 tells you about the linear combination of two features from the dataset. This may tell us that one feature is more important than another!
We normalise the vector made by the slope to get a unit vector. This is known as the singular vector or the eigen vector for PC1
The sum of squared distances for the line of best fit is called the eigen value
The square root of the eigen value is called the singular value
PC2 is a perpendicular line to PC1 (PC3 is a perpendicular line to both PC2 and PC1, ...)
Find the variance for both PC1 and PC2, and then variance_1/(variance_1+variance_2) tells us the "weight" of PC1 among all the PCs
To find the PCx (transformation) we dot the feature vector with the corresponding eigen vector
Once we do this, we can take the PCs that have the high % variation and drop the rest