Chapter 8: Dimensionality Reduction
Most of machine learning algorithm suffer from curse of dimensionality. When new instance far way from any training instance
By projection, we can reduce dimensionality. However, subspace may twist and turn, like Swiss roll, which simply projecting onto a plane will not work
This algorithm identifies the hyper plane that lies closet to the data and projects data onto it. The selection that justifies is to choose maximum amount of variance (avoid losing important information) and minimum mean square distance.
To obtain principal components, we use SVD by
X_centered = X - X.mean(axis=0) #PCA assumes that the dataset is centered around the origin. U, s, V = np.linalg.svd(X_centered) c1 = V.T[:, 0] #we need V^T c2 = V.T[:, 1] #Projecting data W2 = V.T[:, :2] X2D = X_centered.dot(W2)
or in SciKit Learn
from sklearn.decomposition import PCA pca = PCA(n_components = 2) X2D = pca.fit_transform(X)
We can choose right dimension by find elbow it is generally preferable to choose the number of dimensions that add up to a sufficiently large portion of the variance (e.g., 95%).
pca = PCA() pca.fit(X) cumsum = np.cumsum(pca.explained_variance_ratio_) d = np.argmax(cumsum >= 0.95) + 1 # or pca = PCA(n_components=0.95) X_reduced = pca.fit_transform(X)
pca = PCA(n_components = 154) X_mnist_reduced = pca.fit_transform(X_mnist) X_mnist_recovered = pca.inverse_transform(X_mnist_reduced)
A detailed explanation and visualization can be found here The point is that: we have a non linear version of PCA.
Locally Linear Embedding (LLE)
it is also another nonlinear dimensionality reduction. It is to find k nearest neighbors and find linear relationships to them Then based on weight matrix, we map the training instances into d-dimensional space while preserving local relationships.