Related papers: Optical training of large-scale Transformers and deep neural networks with direct feedback alignment

Optical training of large-scale Transformers and deep neural networks with direct feedback alignment

URL: http://arxiv.org/abs/2409.12965v1
Date: Sun, 1 Sep 2024 12:48:47 GMT
Title: Optical training of large-scale Transformers and deep neural networks with direct feedback alignment
Authors: Ziao Wang, Kilian Müller, Matthew Filipovich, Julien Launay, Ruben Ohana, Gustave Pariente, Safa Mokaadi, Charles Brossollet, Fabien Moreau, Alessandro Cappelli, Iacopo Poli, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan,
Abstract summary: We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps. We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks.
Score: 48.90869997343841
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern machine learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation to the size and, consequently, the performance of current architectures and a major compute and energy bottleneck. Here, we experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps. We perform optical training of one of the most recent deep learning architectures, including Transformers, with more than 1B parameters, and obtain good performances on both language and vision tasks. We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks, thus opening a promising route to sustain the exponential growth of modern artificial intelligence beyond traditional von Neumann approaches.

Related papers

Online unsupervised Hebbian learning in deep photonic neuromorphic networks [10.099714133516608]
Photonic neuromorphic networks (PNNs) leverage the inherent advantages of light, namely high parallelism, low latency, and exceptional energy efficiency.<n>Here, we introduce a purely photonic deep PNN architecture that enables online, unsupervised learning.<n>We experimentally demonstrate this approach on a non-trivial letter recognition task using a commercially available fiber-optic platform and achieve a 100 percent recognition rate.
arXiv Detail & Related papers (2026-01-29T20:26:36Z)
Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins [2.8479179029634984]
We introduce ultrashort pulse propagation in multimode fibers, which perform large-scale nonlinear transformations. Training the hybrid architecture is achieved through a neural model that differentiably approximates the optical system. Our experimental results achieve state-of-the-art image classification accuracies and simulation fidelity.
arXiv Detail & Related papers (2025-01-14T10:35:18Z)
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving [52.808273563372126]
This paper proposes a novel hierarchical BEV perception paradigm, aiming to provide a library of fundamental perception modules and user-friendly graphical interface. We conduct the Pretrain-Finetune strategy to effectively utilize large scale public datasets and streamline development processes. We also present a Multi-Module Learning (MML) approach, enhancing performance through synergistic and iterative training of multiple models.
arXiv Detail & Related papers (2024-07-17T11:17:20Z)
Genetically programmable optical random neural networks [0.0]
We demonstrate a genetically programmable yet simple optical neural network to achieve high performances with optical random projection. By genetically programming the orientation of the scattering medium which acts as a random projection kernel, our novel technique finds an optimum kernel and improves its initial test accuracies 7-22%. Our optical computing method presents a promising approach to achieve high performance in optical neural networks with a simple and scalable design.
arXiv Detail & Related papers (2024-03-19T06:55:59Z)
Computation-efficient Deep Learning for Computer Vision: A Survey [121.84121397440337]
Deep learning models have reached or even exceeded human-level performance in a range of visual perception tasks. Deep learning models usually demand significant computational resources, leading to impractical power consumption, latency, or carbon emissions in real-world scenarios. New research focus is computationally efficient deep learning, which strives to achieve satisfactory performance while minimizing the computational cost during inference.
arXiv Detail & Related papers (2023-08-27T03:55:28Z)
Training neural networks with end-to-end optical backpropagation [1.1602089225841632]
We show how to implement backpropagation, an algorithm for training a neural network, using optical processes. Our approach is adaptable to various analog platforms, materials, and network structures. It demonstrates the possibility of constructing neural networks entirely reliant on analog optical processes for both training and inference tasks.
arXiv Detail & Related papers (2023-08-09T21:11:26Z)
Neuromorphic Optical Flow and Real-time Implementation with Event Cameras [47.11134388304464]
We build on the latest developments in event-based vision and spiking neural networks. We propose a new network architecture that improves the state-of-the-art self-supervised optical flow accuracy. We demonstrate high speed optical flow prediction with almost two orders of magnitude reduced complexity.
arXiv Detail & Related papers (2023-04-14T14:03:35Z)
Sophisticated deep learning with on-chip optical diffractive tensor processing [5.081061839052458]
Photonic integrated circuits provide an efficient approach to mitigate bandwidth limitations and power-wall brought by electronic counterparts. We propose an optical computing architecture enabled by on-chip diffraction to implement convolutional acceleration, termed optical convolution unit (OCU) With OCU as the fundamental unit, we build an optical convolutional neural network (oCNN) to implement two popular deep learning tasks: classification and regression.
arXiv Detail & Related papers (2022-12-20T03:33:26Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Monolithic Silicon Photonic Architecture for Training Deep Neural Networks with Direct Feedback Alignment [0.6501025489527172]
We propose on-chip training of neural networks enabled by a CMOS-compatible silicon photonic architecture. Our scheme employs the direct feedback alignment training algorithm, which trains neural networks using error feedback rather than error backpropagation. We experimentally demonstrate training a deep neural network with the MNIST dataset using on-chip MAC operation results.
arXiv Detail & Related papers (2021-11-12T18:31:51Z)
Rapid characterisation of linear-optical networks via PhaseLift [51.03305009278831]
Integrated photonics offers great phase-stability and can rely on the large scale manufacturability provided by the semiconductor industry. New devices, based on such optical circuits, hold the promise of faster and energy-efficient computations in machine learning applications. We present a novel technique to reconstruct the transfer matrix of linear optical networks.
arXiv Detail & Related papers (2020-10-01T16:04:22Z)
Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit [38.898230519968116]
We propose an optoelectronic reconfigurable computing paradigm by constructing a diffractive processing unit. It can efficiently support different neural networks and achieve a high model complexity with millions of neurons. Our prototype system built with off-the-shelf optoelectronic components surpasses the performance of state-of-the-art graphics processing units.
arXiv Detail & Related papers (2020-08-26T16:34:58Z)
Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.