Related papers: Efficient Training for Optical Computing

Efficient Training for Optical Computing

URL: http://arxiv.org/abs/2506.20833v1
Date: Wed, 25 Jun 2025 21:03:47 GMT
Title: Efficient Training for Optical Computing
Authors: Manon P. Bart, Nick Sparks, Ryan T. Glasser,
Abstract summary: We introduce a novel backpropagation algorithm that incorporates plane wave decomposition via the Fourier transform.<n>We demonstrate significant reduction in training time by exploiting the structured and sparse nature of diffractive systems in training and inference.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffractive optical information processors have demonstrated significant promise in delivering high-speed, parallel, and energy efficient inference for scaling machine learning tasks. Training, however, remains a major computational bottleneck, compounded by large datasets and many simulations required for state-of-the-art classification models. The underlying linear transformations in such systems are inherently constrained to compositions of circulant and diagonal matrix factors, representing free-space propagation and phase and/or amplitude modulation of light, respectively. While theoretically established that an arbitrary linear transformation can be generated by such factors, only upper bounds on the number of factors exist, which are experimentally unfeasible. Additionally, physical parameters such as inter-layer distance, number of layers, and phase-only modulation further restrict the solution space. Without tractable analytical decompositions, prior works have implemented various constrained minimization techniques. As trainable elements occupy a small subset of the overall transformation, existing techniques incur unnecessary computational overhead, limiting scalability. In this work, we demonstrate significant reduction in training time by exploiting the structured and sparse nature of diffractive systems in training and inference. We introduce a novel backpropagation algorithm that incorporates plane wave decomposition via the Fourier transform, computing gradients across all trainable elements in a given layer simultaneously, using only change-of-basis and element wise multiplication. Given the lack of a closed-form mathematical decomposition for realizable optical architectures, this approach is not only valuable for machine learning tasks but broadly applicable for the generation of arbitrary linear transformations, wavefront shaping, and other signal processing tasks.

Related papers

S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation [4.945568106952893]
S-Crescendo is a nested transformer weaving framework that synergizes S-domain with neural operators for scalable time-domain prediction.<n>Our method achieves up to 0.99 test-set ($R2$) accuracy against HSPICE golden waveforms and simulation accelerates by up to 18(X)
arXiv Detail & Related papers (2025-05-17T05:06:58Z)
Hardware-Efficient Large-Scale Universal Linear Transformations for Optical Modes in the Synthetic Time Dimension [0.6384650391969042]
We introduce a hardware-efficient time-domain photonic processor that achieves at least an exponential reduction in component count.<n>Our results establish a practical pathway toward near-term, scalable, and reconfigurable photonic processors.
arXiv Detail & Related papers (2025-05-01T21:14:48Z)
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time [17.086679273053853]
We show that a novel fast approximation method can calculate the gradients in almost linear time. By improving the efficiency of gradient, we hope that this work will facilitate more effective training and deployment of long-context language models.
arXiv Detail & Related papers (2024-08-23T17:16:43Z)
Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution [13.298472586395276]
We present an arbitrary-scale super-resolution (SR) method to enhance the resolution of scientific data. We conduct extensive experiments on diverse datasets from different domains.
arXiv Detail & Related papers (2024-05-20T17:39:29Z)
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context [44.949726166566236]
We show that (non-linear) Transformers naturally learn to implement gradient descent in function space. We also show that the optimal choice of non-linear activation depends in a natural way on the class of functions that need to be learned.
arXiv Detail & Related papers (2023-12-11T17:05:25Z)
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations. We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function. We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z)
Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL. We show that transformers can implement a broad class of standard machine learning algorithms in context. A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z)
Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains [13.56018270837999]
We propose a simple method to extend neural operators to arbitrary domains. An efficient implementation* of such direct spectral evaluations is coupled with existing neural operator models. We demonstrate that the proposed method allows us to extend neural operators to arbitrary point distributions with significant gains in training speed over baselines.
arXiv Detail & Related papers (2023-05-31T09:01:20Z)
Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time. This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z)
Random Weight Factorization Improves the Training of Continuous Neural Representations [1.911678487931003]
Continuous neural representations have emerged as a powerful and flexible alternative to classical discretized representations of signals. We propose random weight factorization as a simple drop-in replacement for parameterizing and initializing conventional linear layers. We show how this factorization alters the underlying loss landscape and effectively enables each neuron in the network to learn using its own self-adaptive learning rate.
arXiv Detail & Related papers (2022-10-03T23:48:48Z)
Semi-supervised Learning of Partial Differential Operators and Dynamical Flows [68.77595310155365]
We present a novel method that combines a hyper-network solver with a Fourier Neural Operator architecture. We test our method on various time evolution PDEs, including nonlinear fluid flows in one, two, and three spatial dimensions. The results show that the new method improves the learning accuracy at the time point of supervision point, and is able to interpolate and the solutions to any intermediate time.
arXiv Detail & Related papers (2022-07-28T19:59:14Z)
Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery [16.740247586153085]
We show how to leverage gradient-based learning to solve discrete optimization problems. Our approach is formalized by GLODISMO (Gradient-based Learning of DIscrete Structured Measurement Operators) We empirically demonstrate the performance and flexibility of GLODISMO in several signal recovery applications.
arXiv Detail & Related papers (2022-02-07T18:27:08Z)
Sparse Quantized Spectral Clustering [85.77233010209368]
We exploit tools from random matrix theory to make precise statements about how the eigenspectrum of a matrix changes under such nonlinear transformations. We show that very little change occurs in the informative eigenstructure even under drastic sparsification/quantization.
arXiv Detail & Related papers (2020-10-03T15:58:07Z)
Rapid characterisation of linear-optical networks via PhaseLift [51.03305009278831]
Integrated photonics offers great phase-stability and can rely on the large scale manufacturability provided by the semiconductor industry. New devices, based on such optical circuits, hold the promise of faster and energy-efficient computations in machine learning applications. We present a novel technique to reconstruct the transfer matrix of linear optical networks.
arXiv Detail & Related papers (2020-10-01T16:04:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.