Chapter 8: Dimensionality Reduction
Most of machine learning algorithm suffer from curse of dimensionality. When new instance far way from any training instance
Approaches
Projection
By projection, we can reduce dimensionality. However, subspace may twist and turn, like Swiss roll, which simply projecting onto a plane will not work
PCA
This algorithm identifies the hyper plane that lies closet to the data and projects data onto it. The selection that justifies is to choose maximum amount of variance (avoid losing important information) and minimum mean square distance.
To obtain principal components, we use SVD by
X_centered = X  X.mean(axis=0) #PCA assumes that the dataset is centered around the origin.
U, s, V = np.linalg.svd(X_centered)
c1 = V.T[:, 0] #we need V^T
c2 = V.T[:, 1]
#Projecting data
W2 = V.T[:, :2]
X2D = X_centered.dot(W2)
or in SciKit Learn
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
X2D = pca.fit_transform(X)
We can choose right dimension by find elbow it is generally preferable to choose the number of dimensions that add up to a sufficiently large portion of the variance (e.g., 95%).
pca = PCA()
pca.fit(X)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1
# or
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X)
For compression
pca = PCA(n_components = 154)
X_mnist_reduced = pca.fit_transform(X_mnist)
X_mnist_recovered = pca.inverse_transform(X_mnist_reduced)
Maniford Learning
A detailed explanation and visualization can be found here The point is that: we have a non linear version of PCA.
Locally Linear Embedding (LLE)
it is also another nonlinear dimensionality reduction. It is to find k nearest neighbors and find linear relationships to them Then based on weight matrix, we map the training instances into ddimensional space while preserving local relationships.

Previous
Hands On Machine Learning with Scikit and Tensorflow(IV) 
Next
Neural Networks and Deep Learning (I)