Manual Back Prop with TensorFlow

Decoupled Recurrent Neural Network, modified NN from Google Brain, Implementation with Interactive Code.

Photo from Pixel Bay

I love Tensorflow and it’s ability to perform auto differentiation, however that does not mean we cannot perform manual back propagation even in Tensorflow. Also for me, building an Neural Network is a form of art, and I want to master every single part of it. So today, I will implement Decoupled RNN for simple classification task, in TensorFlow using manual back propagation. And compare the performance with auto differentiated model.

Decoupled NN was originally introduced in this paper “Decoupled Neural Interfaces using Synthetic Gradients” please read. Also I did Numpy version of implementation on Decoupled RNN here.

Finally, please note for this post, I will be more focused on how Tensorflow implementation is different from Numpy’s. And explain the reasons why I like Tensorflow’s implementation.


Network Architecture / Forward Feed / Back Propagation

As seen above, the Network Architecture is exactly the same as Numpy version of RNN as well as the Forward Feed / Back Propagation operation. So I won’t go in depth.


Data Preparation \ Hyper Parameter Declaration

Again, we are going to perform simple classification task on only 0 and 1 images. Nothing special, however please see the last line of the screen shot we are creating our default Graph in order to first make the Network graph and train. (Green Box Region)


Reason 1: Mathematical Equation Implementation

One of the key difference between the Numpy version and TF version is how each equation gets implemented. For example below is how I would implement Derivative of Tanh() in Numpy.

def d_tanh(x):
   return 1 - np.tanh(x) ** 2

However, in Tensorflow, I would implement it like below.

def d_tf_tanh(x):
    return tf.subtract(tf.constant(1.0),tf.square(tf.tanh(x)))

At first it might seem bit weird, but personally I like the Tensorflow’s math notation better, the main reason is it makes you REALLY think before implementing each equations.


Reason 2: Forces you to THINK about how Separate Components come Together.

Hyper Parameter Declaration Part

Above is just the section for declaring the Weights, and learning rate and NOTHING more. Tensorflow forces you to concentrate on each part.

Network Architecture

Above section forces me to think about the Network Architecture, it lets me concentrate on each layer. And makes me think the effect of each layer that I am adding to the network. Such as how would this layer have an overall effect? How can I perform back propagation?? Is this the best choice for this layer?

Finally above is the training section, in this section I can focus on training my network. I am forced to only think about the batch size of my training as well as many other things, only related to training.


Reason 3: Switch and Compare Between Automatic Differentiation

As seen above, while declaring the cost function I can also declare the optimization method. So now I have the freedom to either use Auto Differentiation or Manual Back Propagation.

I can just comment out one section of output while, uncommenting the other.


Training and Results: Manual Back Propagation.

Left image is cost per training loop and right image is result on 20 test set images. Now lets see the result of auto differentiation.


Training and Results: Automatic Differentiation.

I used stochastic gradient descent with learning rate of 0.1. Now speed wise it was amazingly fast. I think the code gets complied first, maybe that is why. However I was very supervised that it did not do so well on the test images.


Interactive Code

I moved to Google Colab for Interactive codes! So you would need a google account to view the codes, also you can’t run read only scripts in Google Colab so make a copy on your play ground. Finally, I will never ask for permission to access your files on Google Drive, just FYI. Happy Coding!

To access the interactive code, please click on this link.


Final Words

As much as I love tensorflow’s automatic differentiation, I don’t its a good idea to only keep relying on Frameworks to train a model. For me building a Neural Network is a form of art and I want to master every single part of it.

If any errors are found, please email me at jae.duk.seo@gmail.com.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube channel for more content. I also did comparison of Decoupled Neural Network here if you are interested.


References

  1. Jaderberg, M., Czarnecki, W. M., Osindero, S., Vinyals, O., Graves, A., & Kavukcuoglu, K. (2016). Decoupled neural interfaces using synthetic gradients. arXiv preprint arXiv:1608.05343.
  2. Seo, J. D. (2018, February 03). Only Numpy: Decoupled Recurrent Neural Network, modified NN from Google Brain, Implementation with… Retrieved February 05, 2018, from https://medium.com/swlh/only-numpy-decoupled-recurrent-neural-network-modified-nn-from-google-brain-implementation-with-7f328e7899e6
  3. Aloni, D. (n.d.). Dan Aloni’s blog. Retrieved February 05, 2018, from http://blog.aloni.org/posts/backprop-with-tensorflow/