Gradient Descent Gradient-Based Learning Applied to Document Recognition (1998) Adam: A Method for Stochastic Optimization (2014)