Optical Transformers
- URL: http://arxiv.org/abs/2302.10360v1
- Date: Mon, 20 Feb 2023 23:30:23 GMT
- Title: Optical Transformers
- Authors: Maxwell G. Anderson, Shi-Yuan Ma, Tianyu Wang, Logan G. Wright, Peter
L. McMahon
- Abstract summary: Large Transformer models could be a good target for optical computing.
optical computers could have a $>8,000times$ energy-efficiency advantage over state-of-the-art digital-electronic processors that achieve 300 fJ/MAC.
- Score: 5.494796517705931
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapidly increasing size of deep-learning models has caused renewed and
growing interest in alternatives to digital computers to dramatically reduce
the energy cost of running state-of-the-art neural networks. Optical
matrix-vector multipliers are best suited to performing computations with very
large operands, which suggests that large Transformer models could be a good
target for optical computing. To test this idea, we performed small-scale
optical experiments with a prototype accelerator to demonstrate that
Transformer operations can run on optical hardware despite noise and errors.
Using simulations, validated by our experiments, we then explored the energy
efficiency of optical implementations of Transformers and identified scaling
laws for model performance with respect to optical energy usage. We found that
the optical energy per multiply-accumulate (MAC) scales as $\frac{1}{d}$ where
$d$ is the Transformer width, an asymptotic advantage over digital systems. We
conclude that with well-engineered, large-scale optical hardware, it may be
possible to achieve a $100 \times$ energy-efficiency advantage for running some
of the largest current Transformer models, and that if both the models and the
optical hardware are scaled to the quadrillion-parameter regime, optical
computers could have a $>8,000\times$ energy-efficiency advantage over
state-of-the-art digital-electronic processors that achieve 300 fJ/MAC. We
analyzed how these results motivate and inform the construction of future
optical accelerators along with optics-amenable deep-learning approaches. With
assumptions about future improvements to electronics and Transformer
quantization techniques (5$\times$ cheaper memory access, double the
digital--analog conversion efficiency, and 4-bit precision), we estimated that
optical computers' advantage against current 300-fJ/MAC digital processors
could grow to $>100,000\times$.
Related papers
- Transferable polychromatic optical encoder for neural networks [13.311727599288524]
In this paper, we demonstrate an optical encoder that can perform convolution simultaneously in three color channels during the image capture.
Such an optical encoding results in 24,000 times reduction in computational operations, with a state-of-the art classification accuracy (73.2%) in free-space optical system.
arXiv Detail & Related papers (2024-11-05T00:49:47Z) - Optical training of large-scale Transformers and deep neural networks with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform.
An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps.
We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z) - TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge
AI with Compact Slow-Light Electro-Optic Modulator [44.74560543672329]
We present a time-multiplexed dynamic photonic tensor accelerator, dubbed TeMPO, with cross-layer device/circuit/architecture customization.
We achieve a 368.6 TOPS peak performance, 22.3 TOPS/W energy efficiency, and 1.2 TOPS/mm$2$ compute density.
This work signifies the power of cross-layer co-design and domain-specific customization, paving the way for future electronic-photonic accelerators.
arXiv Detail & Related papers (2024-02-12T03:40:32Z) - Digital-analog hybrid matrix multiplication processor for optical neural
networks [11.171425574890765]
We propose a digital-analog hybrid optical computing architecture for optical neural networks (ONNs)
By introducing the logic levels and decisions based on thresholding, the calculation precision can be significantly enhanced.
We have demonstrated an unprecedented 16-bit calculation precision for high-definition image processing, with a pixel error rate (PER) as low as $1.8times10-3$ at a signal-to-noise ratio (SNR) of 18.2 dB.
arXiv Detail & Related papers (2024-01-26T18:42:57Z) - Quantum-limited millimeter wave to optical transduction [50.663540427505616]
Long distance transmission of quantum information is a central ingredient of distributed quantum information processors.
Current approaches to transduction employ solid state links between electrical and optical domains.
We demonstrate quantum-limited transduction of millimeter-wave (mmwave) photons into optical photons using cold $85$Rb atoms as the transducer.
arXiv Detail & Related papers (2022-07-20T18:04:26Z) - Single-Shot Optical Neural Network [55.41644538483948]
'Weight-stationary' analog optical and electronic hardware has been proposed to reduce the compute resources required by deep neural networks.
We present a scalable, single-shot-per-layer weight-stationary optical processor.
arXiv Detail & Related papers (2022-05-18T17:49:49Z) - An optical neural network using less than 1 photon per multiplication [4.003843776219224]
We experimentally demonstrate an optical neural network achieving 99% accuracy on handwritten-digit classification.
This performance was achieved using a custom free-space optical processor.
Our results provide a proof-of-principle for low-optical-power operation.
arXiv Detail & Related papers (2021-04-27T20:43:23Z) - Dynamic compensation of stray electric fields in an ion trap using
machine learning and adaptive algorithm [55.41644538483948]
Surface ion traps are among the most promising technologies for scaling up quantum computing machines.
Here we demonstrate the compensation of stray electric fields using a gradient descent algorithm and a machine learning technique.
arXiv Detail & Related papers (2021-02-11T03:27:31Z) - Rapid characterisation of linear-optical networks via PhaseLift [51.03305009278831]
Integrated photonics offers great phase-stability and can rely on the large scale manufacturability provided by the semiconductor industry.
New devices, based on such optical circuits, hold the promise of faster and energy-efficient computations in machine learning applications.
We present a novel technique to reconstruct the transfer matrix of linear optical networks.
arXiv Detail & Related papers (2020-10-01T16:04:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.