Hands On Machine Learning with Scikit and Tensorflow(V)

Posted by Kaiyuan Chen on September 9, 2017

Chapter 8: Dimensionality Reduction

Most of machine learning algorithm suffer from curse of dimensionality. When new instance far way from any training instance

Approaches

Projection

By projection, we can reduce dimensionality. However, subspace may twist and turn, like Swiss roll, which simply projecting onto a plane will not work

PCA

This algorithm identifies the hyper plane that lies closet to the data and projects data onto it. The selection that justifies is to choose maximum amount of variance (avoid losing important information) and minimum mean square distance.

To obtain principal components, we use SVD by

X_centered = X - X.mean(axis=0) #PCA assumes that the dataset is centered around the origin.
U, s, V = np.linalg.svd(X_centered)
c1 = V.T[:, 0] #we need V^T
c2 = V.T[:, 1]

#Projecting data
W2 = V.T[:, :2]
X2D = X_centered.dot(W2)

or in SciKit Learn

from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
X2D = pca.fit_transform(X)

We can choose right dimension by find elbow it is generally preferable to choose the number of dimensions that add up to a sufficiently large portion of the variance (e.g., 95%).

pca = PCA()
pca.fit(X)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1

# or 
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X)

For compression

pca = PCA(n_components = 154)
X_mnist_reduced = pca.fit_transform(X_mnist)
X_mnist_recovered = pca.inverse_transform(X_mnist_reduced)

Maniford Learning

A detailed explanation and visualization can be found here The point is that: we have a non linear version of PCA.

Locally Linear Embedding (LLE)

it is also another nonlinear dimensionality reduction. It is to find k nearest neighbors and find linear relationships to them Then based on weight matrix, we map the training instances into d-dimensional space while preserving local relationships.