Authors: Yann LeCun, Leon Bottou, Yoshua Bengio, Patrick Haffner
Area: Computer Vision, Deep Learning, OCR
Link: https://ieeexplore.ieee.org/document/726791
This paper introduced Convolutional Neural Networks (CNNs) and showed that neural networks trained with gradient-based learning can outperform traditional handcrafted pattern recognition systems.
The central idea is to learn features directly from raw images instead of manually designing feature extractors.
Traditional pattern recognition systems were built with two separate components:

Source: Figure adapted from LeCun et al., “Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE, 1998.
The feature extractor converted images into engineered representations before classification.
Limitations of this approach:
The paper proposes replacing this pipeline with end-to-end learning using neural networks.
Neural networks trained using gradient descent and backpropagation can learn both:
directly from raw pixel data.
This allows systems to automatically discover useful patterns without manual feature design.
Fully connected neural networks are inefficient for image data because:
CNNs solve these issues using three architectural principles.
Neurons connect only to a small region of the input.
This allows the network to detect local features such as:
The same filter is applied across the entire image.
Benefits:
Pooling layers reduce spatial resolution.
Advantages:

Source: Figure adapted from LeCun et al., “Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE, 1998.
The paper presents LeNet-5, one of the earliest successful CNN architectures for handwritten digit recognition.
Input size: 32 Ă— 32 grayscale image
Architecture:
Input
→ Convolution layer (C1)
→ Subsampling layer (S2)
→ Convolution layer (C3)
→ Subsampling layer (S4)
→ Convolution layer (C5)
→ Fully connected layer (F6)
→ Output layer
The network learns hierarchical representations:
edges → strokes → shapes → digits

Source: Figure adapted from LeCun et al., “Gradient-Based Learning Applied to Document Recognition”, Proceedings of the IEEE, 1998.
Real document recognition systems consist of multiple modules such as:
Traditionally these modules were trained independently.
Graph Transformer Networks allow global training of the entire system using gradient descent, enabling optimization of the full recognition pipeline.
CNNs were evaluated on handwritten digit recognition tasks using the NIST handwritten digit dataset.
The models achieved state-of-the-art performance, outperforming traditional approaches such as:
The methods in this paper were deployed in a bank check recognition system that processed millions of checks daily, proving neural networks could work at real-world scale. The work introduced key ideas such as Convolutional Neural Networks, weight sharing, pooling, and end-to-end learning, which later became the foundation for modern computer vision, OCR systems, and deep learning vision models.