Hands On Machine Learning with Scikit and Tensorflow(VI)

Posted by Kaiyuan Chen on September 13, 2017

CH10 Introduction to ANN

Perceptron: weighted sum of its inputs

Hebb’s rule

The connection btw two neurons grow stronger when a biological neuron triggers another

In training, the weight of connection increase

from sklearn.linear_model import Perceptron
per_clf = Perceptron(random_state=42)
per_clf.fit(X, y)
y_pred = per_clf.predict([[1,2]])

Note that contrary to Logistic Regression classifiers, Perceptrons do not output a class probability; rather, they just make predictions based on a hard threshold

Multi-layered API

When ANN has two or more hidden layers, it is called deep neural network

Book’s training backpropagation

for each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection (reverse pass), and finally slightly tweaks the connection weights to reduce the error (Gradient Descent step).

In order to make Gradient Descent make some progress, activation: original step function is replaced by logical function

#High level API
import tensorflow as tf
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)
dnn_clf = tf.contrib.learn.DNNClassifier(hidden_units=[300, 100], n_classes=10,
dnn_clf.fit(x=X_train, y=y_train, batch_size=50, steps=40000)
dnn_clf.evaluate(X_test, y_test)
Construction phase

for low level plain tensorflow, I write everything on the notebook, here are a few things worthwhile to highlight:

  • Shape
    n_inputs = 28*28 # MNIS
    X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
    y = tf.placeholder(tf.int64, shape=(None), name="y")

    we make 28*28 features(for every pixel) to every feature, and each feature is considered to be a #

    def neuron_layer(X, n_neurons, name, activation=None):
      with tf.name_scope(name): #give Tensorboad a better look
          n_inputs = int(X.get_shape()[1]) #input matrix shape 
          stddev = 2 / np.sqrt(n_inputs) # make weight a truncated guassian distribution, it will make matrix converge faster
          init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
          W = tf.Variable(init, name="kernel") 
          b = tf.Variable(tf.zeros([n_neurons]), name="bias") #bias
          Z = tf.matmul(X, W) + b #subgraph 
          if activation is not None:
              return activation(Z) #supposed to return relu(Z) or just Z
              return Z

then the dnn is obvious

hidden1 = neuron_layer(X, n_hidden1, "hidden1", activation="relu")hidden2 = neuron_layer(hidden1, n_hidden2, "hidden2", activation="relu")
logits = neuron_layer(hidden2, n_outputs, "outputs")

The tensorflow has its own way of doing this

from tensorflow.contrib.layers import fully_connected

hidden1 = fully_connected(X, n_hidden1, scope="hidden1")
hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2")
logits = fully_connected(hidden2, n_outputs, scope="outputs", activation_fn=None)

loss function is defined as

xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
loss = tf.reduce_mean(xentropy, name="loss")

and thus we optimize loss function by

optimizer = tf.train.GradientDescentOptimizer(learning_rate)training_op = optimizer.minimize(loss)
Execution Phase
for iteration in range(mnist.train.num_examples // batch_size):
    X_batch, y_batch = mnist.train.next_batch(batch_size)
    sess.run(training_op, feed_dict={X: X_batch, y: y_batch})