Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins
- URL: http://arxiv.org/abs/2501.07991v1
- Date: Tue, 14 Jan 2025 10:35:18 GMT
- Title: Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins
- Authors: Ilker Oguz, Louis J. E. Suter, Jih-Liang Hsieh, Mustafa Yildirim, Niyazi Ulas Dinc, Christophe Moser, Demetri Psaltis,
- Abstract summary: We introduce ultrashort pulse propagation in multimode fibers, which perform large-scale nonlinear transformations.
Training the hybrid architecture is achieved through a neural model that differentiably approximates the optical system.
Our experimental results achieve state-of-the-art image classification accuracies and simulation fidelity.
- Score: 2.8479179029634984
- License:
- Abstract: The ability to train ever-larger neural networks brings artificial intelligence to the forefront of scientific and technical discoveries. However, their exponentially increasing size creates a proportionally greater demand for energy and computational hardware. Incorporating complex physical events in networks as fixed, efficient computation modules can address this demand by decreasing the complexity of trainable layers. Here, we utilize ultrashort pulse propagation in multimode fibers, which perform large-scale nonlinear transformations, for this purpose. Training the hybrid architecture is achieved through a neural model that differentiably approximates the optical system. The training algorithm updates the neural simulator and backpropagates the error signal over this proxy to optimize layers preceding the optical one. Our experimental results achieve state-of-the-art image classification accuracies and simulation fidelity. Moreover, the framework demonstrates exceptional resilience to experimental drifts. By integrating low-energy physical systems into neural networks, this approach enables scalable, energy-efficient AI models with significantly reduced computational demands.
Related papers
- Optical training of large-scale Transformers and deep neural networks with direct feedback alignment [48.90869997343841]
We experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform.
An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOps.
We study the compute scaling of our hybrid optical approach, and demonstrate a potential advantage for ultra-deep and wide neural networks.
arXiv Detail & Related papers (2024-09-01T12:48:47Z) - Mechanistic Neural Networks for Scientific Machine Learning [58.99592521721158]
We present Mechanistic Neural Networks, a neural network design for machine learning applications in the sciences.
It incorporates a new Mechanistic Block in standard architectures to explicitly learn governing differential equations as representations.
Central to our approach is a novel Relaxed Linear Programming solver (NeuRLP) inspired by a technique that reduces solving linear ODEs to solving linear programs.
arXiv Detail & Related papers (2024-02-20T15:23:24Z) - Gradual Optimization Learning for Conformational Energy Minimization [69.36925478047682]
Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks significantly reduces the required additional data.
Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules.
arXiv Detail & Related papers (2023-11-05T11:48:08Z) - NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with
Spatial-temporal Decomposition [67.46012350241969]
This paper proposes a general acceleration methodology called NeuralStagger.
It decomposing the original learning tasks into several coarser-resolution subtasks.
We demonstrate the successful application of NeuralStagger on 2D and 3D fluid dynamics simulations.
arXiv Detail & Related papers (2023-02-20T19:36:52Z) - Sparse deep neural networks for modeling aluminum electrolysis dynamics [0.5257115841810257]
We train sparse neural networks to model the system dynamics of an aluminum electrolysis simulator.
The sparse model structure has a significantly reduction in model complexity compared to a corresponding dense neural network.
The empirical study shows that the sparse models generalize better from small training sets than dense neural networks.
arXiv Detail & Related papers (2022-09-13T09:11:50Z) - Hybrid training of optical neural networks [1.0323063834827415]
Optical neural networks are emerging as a promising type of machine learning hardware.
These networks are mainly developed to perform optical inference after in silico training on digital simulators.
We show that hybrid training of optical neural networks can be applied to a wide variety of optical neural networks.
arXiv Detail & Related papers (2022-03-20T21:16:42Z) - Large-scale neuromorphic optoelectronic computing with a reconfigurable
diffractive processing unit [38.898230519968116]
We propose an optoelectronic reconfigurable computing paradigm by constructing a diffractive processing unit.
It can efficiently support different neural networks and achieve a high model complexity with millions of neurons.
Our prototype system built with off-the-shelf optoelectronic components surpasses the performance of state-of-the-art graphics processing units.
arXiv Detail & Related papers (2020-08-26T16:34:58Z) - Ultra-Low-Power FDSOI Neural Circuits for Extreme-Edge Neuromorphic
Intelligence [2.6199663901387997]
In-memory computing mixed-signal neuromorphic architectures provide promising ultra-low-power solutions for edge-computing sensory-processing applications.
We present a set of mixed-signal analog/digital circuits that exploit the features of advanced Fully-Depleted Silicon on Insulator (FDSOI) integration processes.
arXiv Detail & Related papers (2020-06-25T09:31:29Z) - Training End-to-End Analog Neural Networks with Equilibrium Propagation [64.0476282000118]
We introduce a principled method to train end-to-end analog neural networks by gradient descent.
We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models.
Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.
arXiv Detail & Related papers (2020-06-02T23:38:35Z) - Flexible Transmitter Network [84.90891046882213]
Current neural networks are mostly built upon the MP model, which usually formulates the neuron as executing an activation function on the real-valued weighted aggregation of signals received from other neurons.
We propose the Flexible Transmitter (FT) model, a novel bio-plausible neuron model with flexible synaptic plasticity.
We present the Flexible Transmitter Network (FTNet), which is built on the most common fully-connected feed-forward architecture.
arXiv Detail & Related papers (2020-04-08T06:55:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.