# Hands On Machine Learning with Scikit and Tensorflow(V)

Posted by Kaiyuan Chen on September 9, 2017

## Chapter 8: Dimensionality Reduction

Most of machine learning algorithm suffer from curse of dimensionality. When new instance far way from any training instance

### Approaches

#### Projection

By projection, we can reduce dimensionality. However, subspace may twist and turn, like Swiss roll, which simply projecting onto a plane will not work

### PCA

This algorithm identifies the hyper plane that lies closet to the data and projects data onto it. The selection that justifies is to choose maximum amount of variance (avoid losing important information) and minimum mean square distance.

To obtain principal components, we use SVD by

``````X_centered = X - X.mean(axis=0) #PCA assumes that the dataset is centered around the origin.
U, s, V = np.linalg.svd(X_centered)
c1 = V.T[:, 0] #we need V^T
c2 = V.T[:, 1]

#Projecting data
W2 = V.T[:, :2]
X2D = X_centered.dot(W2)
``````

or in SciKit Learn

``````from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
X2D = pca.fit_transform(X)
``````

We can choose right dimension by find elbow it is generally preferable to choose the number of dimensions that add up to a sufficiently large portion of the variance (e.g., 95%).

``````pca = PCA()
pca.fit(X)
cumsum = np.cumsum(pca.explained_variance_ratio_)
d = np.argmax(cumsum >= 0.95) + 1

# or
pca = PCA(n_components=0.95)
X_reduced = pca.fit_transform(X)

``````

#### For compression

``````pca = PCA(n_components = 154)
X_mnist_reduced = pca.fit_transform(X_mnist)
X_mnist_recovered = pca.inverse_transform(X_mnist_reduced)
``````

#### Maniford Learning

A detailed explanation and visualization can be found here The point is that: we have a non linear version of PCA.

#### Locally Linear Embedding (LLE)

it is also another nonlinear dimensionality reduction. It is to find k nearest neighbors and find linear relationships to them Then based on weight matrix, we map the training instances into d-dimensional space while preserving local relationships. 