Neural Networks Tricks-Of-The-Trade

Joaquín Padilla Montani

Deep Learning with TensorFlow WS18/19 @ IST Austria

January 7th, 2019

Difficulties in Training DNN

Current DNN arquitectures can be very deep
(e.g., for object detection in high-resolution images).

This presents several challenges:

  1. Lower layers are hard to train
  2. Speed (lots of parameters; massive data sets)
  3. High risk of overfitting

1. Vanishing/Exploding Gradients Problem

Often the gradient w.r.t. weights in lower layers is very small/vanishes.

This can slow/stop training.

The opposite can also happen (e.g., RNNs), where gradients explode.

What to do about it:

  • Better (nonsaturating) activation functions
  • Smarter initializations
  • Batch normalization
  • Gradient Clipping

(Nonsaturating) Activation Functions

In [7]:
#Image from: "Hand-On Machine Learning [...]" A. Géron (coursebook)
Image(filename = "activations.png", width = 800)