CS Notes 📚
Gradient Descent
Gradient-Based Learning Applied to Document Recognition (1998)
Adam: A Method for Stochastic Optimization (2014)