Implementing a simple Artificial Neural Network using Tensorflow

Ever since Tensorflow was released by Google back in Nov-2015, I always wanted to get my hands dirty with it. Being very new to machine learning back then(I still am a noob), the documentation didn’t make much sense to me and having come from Caffe’s background, the way in which a network was described and trained didn’t make any sense either. I recently came across Stephen Welch’s video series Neural Networks Demystified and that inspired me to implement a simple artificial neural network. I choose to implement the same in Tensorflow. (What better way to overcome your fear than to actually face it..eh?)

So here is the problem statement and I’ll walk through the steps of implementing the same.

Problem: Model an OR gate using an ANN

Now, let’s see the truth table for an OR gate

Input Output
000
011
101
111

There are two inputs and one output. So our neural network should have two neurons in the input layer and one in the output layer. For simplicity let’s consider only one hidden layer with 3 neurons in it. Our neural network will look something like this: network

Let the input be represented by x and output with y. The training data is given below:

x_train = [[0,0], [0,1], [1,0], [1,1]]

y_train = [[0], [1], [1], [1]]

Our weights in first layer is represented in matrix of size 2x3:

W1 = [[w11,w12,w13], [w21,w22,w23]]

Second layer weights again can be represented in a matrix of dimension 3x1

W2 = [[w1, w2, w3]]

The output at first layer z1 is matrix multiplication of input x and weights W1. We will also apply activation function to our output. Activation functions are used to make our model non-linear so that we can fit complex data patterns. This Quora thread has nice explanation of activation functions. We will choose tanh or the hyberbolic tangent function as our activation function. It is represented as follows:

tanh

So, the first layer ouput can be written as: A1 = f(z1), where z1 = matmul(x, W1)

similarly, output of final layer y' = f(z2) where z2 = matmul(A1, W2)

We have represented out output as y' since it is the predicted output from the system and not the actual output.

We should then define our cost function which is a measure of how bad or good our system is. Our aim is to minimize the cost. We will choose our cost function as sum of the squared error for each training data. We will minimize the cost using the Gradient Descent method.

Once we have outlined our network and other considerations, it’s time for us to implement the same in Tensorflow.

import tensorflow as tf

x = tf.placeholder(tf.float32, [None, 2])
y = tf.placeholder(tf.float32)

W1 = tf.Variable(tf.random_normal([2,3]))
W2 = tf.Variable(tf.random_normal([3,1]))

Z1 = tf.matmul(x, W1)
A1 = tf.tanh(Z1)

Z2 = tf.matmul(A1, W2)
y_ = tf.tanh(Z2)

cost = 0.5 * tf.reduce_sum(tf.square(y_ - y), 1)

optimizer = tf.train.GradientDescentOptimizer(0.1)
train = optimizer.minimize(cost)

x_train = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_train = [[0], [1], [1], [1]]

sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)

print(sess.run(y_, {x:x_train}))
for i in range(10000):
    sess.run(train, {x:x_train, y:y_train})

print(sess.run(y_, {x:x_train}))

Written on July 31, 2017