The Evolution of Backpropagation: A Revolutionary Breakthrough in Machine Learning

4 min readDec 24, 2023

Introduction

The landscape of machine learning has been revolutionized by an ingenious technique called backpropagation. This powerful algorithm has breathed new life into artificial neural networks, enabling them to evolve into deep learning powerhouses. The ability to learn from data, optimize parameters, and generalize patterns has transformed industries ranging from healthcare to finance. In this article, we will explore the fascinating history and evolution of backpropagation, tracing its origins, key milestones, and its impact on modern neural network training.

The Origins of Backpropagation

Early Work on Backpropagation and Gradient Descent

Backpropagation, although derived multiple times independently, is essentially an efficient application of the chain rule to neural networks. The concept of backpropagation was introduced by Frank Rosenblatt in 1962, who coined the term “back-propagating error correction.” However, he faced challenges in implementing this technique as he focused on neurons with discrete outputs, which made backpropagation impossible.

Precursors to backpropagation can be traced back to the 1950s when optimal control theory laid its foundations. Yann leCun et al credit the work of Pontryagin and others in optimal control theory, particularly the adjoint state method, as a continuous-time version of backpropagation. Other notable precursors include the Robbins-Monro algorithm and Arthur Bryson and Yu-Chi Ho’s Applied Optimal Control.

Stuart Dreyfus and the Chain Rule

In 1962, Stuart Dreyfus published a simpler derivation of backpropagation based solely on the chain rule. He adapted the parameters of controllers in proportion to error gradients, marking a significant step towards modern backpropagation. However, these early precursors of backpropagation did not address direct links across multiple stages or the potential efficiency gains due to network sparsity.

The ADALINE Algorithm and Multilayer Perceptrons

The ADALINE learning algorithm, introduced in 1960, utilized gradient descent with a squared error loss for a single layer. Shun’ichi Amari’s groundbreaking work in 1967 marked the first instance of training a multilayer perceptron (MLP) with more than one layer using stochastic gradient descent. His five-layer MLP with two modifiable layers successfully learned internal representations required to classify non-linearly separable pattern classes.

Seppo Linnainmaa and Reverse Mode of Automatic Differentiation

In 1970, Seppo Linnainmaa published the modern version of backpropagation, also known as the reverse mode of automatic differentiation. Linnainmaa’s work focused on discrete connected networks of nested differentiable functions. This marked a significant milestone in the development of backpropagation, laying the groundwork for its future applications.

Paul Werbos and Standardizing Backpropagation

In 1982, Paul Werbos standardized the backpropagation technique for multilayer perceptrons (MLPs). In an interview, he described his journey of developing backpropagation during his PhD work, where he aimed to mathematize Freud’s “flow of psychic energy.” Despite facing difficulties in publishing his work, Werbos eventually succeeded in 1981.

David E. Rumelhart and Experimental Analysis

In 1985, David E. Rumelhart and his colleagues published an influential experimental analysis of backpropagation. Their work contributed to the popularization of backpropagation and initiated an active period of research in multilayer perceptrons. Rumelhart developed the backpropagation technique independently, further solidifying its importance in neural network training.

Yann LeCun’s Alternative Form of Backpropagation

In his PhD thesis in 1987, Yann LeCun proposed an alternative form of backpropagation for neural networks. LeCun’s work added to the growing body of research on backpropagation, showcasing its versatility and potential for further advancements.

The Journey to Acceptance and Resurgence

Early Objections and Hurdles

Gradient descent, the underlying optimization algorithm used in backpropagation, faced initial objections. Critics argued that there were no guarantees of reaching a global minimum and that neurons were thought to produce discrete signals rather than continuous ones, which made the concept of gradients questionable. However, these objections were gradually overcome as researchers continued to explore and refine backpropagation.

Decline and Resurgence

During the 2000s, backpropagation fell out of favor, but its fortunes changed in the 2010s with the advent of cheap, powerful GPU-based computing systems. This resurgence was particularly evident in domains such as speech recognition, machine vision, natural language processing, and language structure learning. Backpropagation proved instrumental in explaining various phenomena related to first and second language learning.

Conclusion

The evolution of backpropagation is a testament to the power of innovation and perseverance in the field of machine learning. From its origins in the 1960s to its standardization and experimental analysis in the 1980s, backpropagation has transformed the capabilities of neural networks. Its impact on industries and research domains cannot be overstated. As we celebrate the half-century anniversary of backpropagation in 2020, we look forward to further advancements in this groundbreaking technique, propelling machine learning to new heights.

References:

https://en.wikipedia.org/wiki/Backpropagation
https://people.idsia.ch/~juergen/who-invented-backpropagation.html

https://medium.com/syncedreview/who-invented-backpropagation-hinton-says-he-didnt-but-his-work-made-it-popular-e0854504d6d1

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf