Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers
- URL: http://arxiv.org/abs/2509.15113v1
- Date: Thu, 18 Sep 2025 16:17:44 GMT
- Title: Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers
- Authors: Andrei Chertkov, Artem Basharin, Mikhail Saygin, Evgeny Frolov, Stanislav Straupe, Ivan Oseledets,
- Abstract summary: We present a framework for the end-to-end training of hybrid networks with reconfigurable physical layers.<n>A key component of our approach is the implicit projector-splitting integrator algorithm, which updates the lightweight surrogate model after each forward pass.<n>We demonstrate our method across diverse deep learning tasks, including: computer vision, audio classification, and language modeling.
- Score: 4.1673753346810765
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The growing demand for energy-efficient, high-performance AI systems has led to increased attention on alternative computing platforms (e.g., photonic, neuromorphic) due to their potential to accelerate learning and inference. However, integrating such physical components into deep learning pipelines remains challenging, as physical devices often offer limited expressiveness, and their non-differentiable nature renders on-device backpropagation difficult or infeasible. This motivates the development of hybrid architectures that combine digital neural networks with reconfigurable physical layers, which effectively behave as black boxes. In this work, we present a framework for the end-to-end training of such hybrid networks. This framework integrates stochastic zeroth-order optimization for updating the physical layer's internal parameters with a dynamic low-rank surrogate model that enables gradient propagation through the physical layer. A key component of our approach is the implicit projector-splitting integrator algorithm, which updates the lightweight surrogate model after each forward pass with minimal hardware queries, thereby avoiding costly full matrix reconstruction. We demonstrate our method across diverse deep learning tasks, including: computer vision, audio classification, and language modeling. Notably, across all modalities, the proposed approach achieves near-digital baseline accuracy and consistently enables effective end-to-end training of hybrid models incorporating various non-differentiable physical components (spatial light modulators, microring resonators, and Mach-Zehnder interferometers). This work bridges hardware-aware deep learning and gradient-free optimization, thereby offering a practical pathway for integrating non-differentiable physical components into scalable, end-to-end trainable AI systems.
Related papers
- SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices [72.0937240883345]
Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment.<n>We present an efficient DiT framework tailored for mobile and edge devices that achieves transformer-level generation quality under strict resource constraints.
arXiv Detail & Related papers (2026-01-13T07:46:46Z) - Rethinking the Role of Dynamic Sparse Training for Scalable Deep Reinforcement Learning [58.533203990515034]
Scaling neural networks has driven breakthrough advances in machine learning, yet this paradigm fails in deep reinforcement learning (DRL)<n>We show that dynamic sparse training strategies provide module-specific benefits that complement the primary scalability foundation established by architectural improvements.<n>We finally distill these insights into Module-Specific Training (MST), a practical framework that exploits the benefits of architectural improvements and demonstrates substantial scalability gains across diverse RL algorithms without algorithmic modifications.
arXiv Detail & Related papers (2025-10-14T03:03:08Z) - diffSPH: Differentiable Smoothed Particle Hydrodynamics for Adjoint Optimization and Machine Learning [21.05257407408671]
diffSPH is a differentiable Smoothed Particle Hydrodynamics (SPH) framework developed entirely in PyTorch with GPU acceleration.<n> diffSPH is designed centrally around differentiation to facilitate optimization and machine learning (ML) applications in Computational Fluid Dynamics(CFD)<n>We demonstrate the framework's unique capabilities through several applications, including addressing particle shifting via a novel, target-oriented approach.
arXiv Detail & Related papers (2025-07-29T10:54:27Z) - Training Hybrid Neural Networks with Multimode Optical Nonlinearities Using Digital Twins [2.8479179029634984]
We introduce ultrashort pulse propagation in multimode fibers, which perform large-scale nonlinear transformations.<n>Training the hybrid architecture is achieved through a neural model that differentiably approximates the optical system.<n>Our experimental results achieve state-of-the-art image classification accuracies and simulation fidelity.
arXiv Detail & Related papers (2025-01-14T10:35:18Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures [73.65190161312555]
ARCANA is a software spiking neural network simulator designed to account for the properties of mixed-signal neuromorphic circuits.<n>We show how the results obtained provide a reliable estimate of the behavior of the spiking neural network trained in software, once deployed in hardware.
arXiv Detail & Related papers (2024-09-23T11:16:46Z) - Online Calibration of Deep Learning Sub-Models for Hybrid Numerical
Modeling Systems [34.50407690251862]
We present an efficient and practical online learning approach for hybrid systems.
We demonstrate that the method, called EGA for Euler Gradient Approximation, converges to the exact gradients in the limit of infinitely small time steps.
Results show significant improvements over offline learning, highlighting the potential of end-to-end online learning for hybrid modeling.
arXiv Detail & Related papers (2023-11-17T17:36:26Z) - Interpretable learning of effective dynamics for multiscale systems [5.754251195342313]
We propose a novel framework of Interpretable Learning Effective Dynamics (iLED)
iLED offers comparable accuracy to state-of-theart recurrent neural network-based approaches.
Our results show that the iLED framework can generate accurate predictions and obtain interpretable dynamics.
arXiv Detail & Related papers (2023-09-11T20:29:38Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data.
We show that a neural network can model highly nonlinear behaviors accurately for large time horizons.
In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z) - Real-time Neural-MPC: Deep Learning Model Predictive Control for
Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline.
We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z) - Gradient descent in materia through homodyne gradient extraction [2.012950941269354]
We demonstrate a simple yet efficient gradient extraction method, based on the principle of homodyne detection.<n>By perturbing the parameters that need to be optimized we effectively obtain the gradient information in a highly robust and scalable manner.<n>Homodyne gradient extraction can in principle be fully implemented in materia, facilitating the development of autonomously learning material systems.
arXiv Detail & Related papers (2021-05-15T12:18:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.