Current DNN arquitectures can be very deep
(e.g., for object detection in high-resolution images).
This presents several challenges:
Often the gradient w.r.t. weights in lower layers is very small/vanishes.
This can slow/stop training.
The opposite can also happen (e.g., RNNs), where gradients explode.
What to do about it:
#Image from: "Hand-On Machine Learning [...]" A. Géron (coursebook) Image(filename = "activations.png", width = 800)