Extending Straight-Through Estimation for Robust Neural Networks on Analog CIM Hardware
- URL: http://arxiv.org/abs/2508.11940v1
- Date: Sat, 16 Aug 2025 06:53:44 GMT
- Title: Extending Straight-Through Estimation for Robust Neural Networks on Analog CIM Hardware
- Authors: Yuannuo Feng, Wenyong Zhou, Yuexi Lyu, Yixiang Zhang, Zhengwu Liu, Ngai Wong, Wang Kang,
- Abstract summary: We propose a noise-aware training method for analog Compute-In-Memory (CIM) systems.<n>We decouple forward noise simulation from backward gradient computation, enabling noise-aware training with more accurate but computationally intractable noise modeling.<n>Our framework achieves up to 5.3% accuracy improvement on image classification, 0.72 perplexity reduction on text generation, 2.2$times$ speedup in training time, and 37.9% lower peak memory usage.
- Score: 5.100973962435092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analog Compute-In-Memory (CIM) architectures promise significant energy efficiency gains for neural network inference, but suffer from complex hardware-induced noise that poses major challenges for deployment. While noise-aware training methods have been proposed to address this issue, they typically rely on idealized and differentiable noise models that fail to capture the full complexity of analog CIM hardware variations. Motivated by the Straight-Through Estimator (STE) framework in quantization, we decouple forward noise simulation from backward gradient computation, enabling noise-aware training with more accurate but computationally intractable noise modeling in analog CIM systems. We provide theoretical analysis demonstrating that our approach preserves essential gradient directional information while maintaining computational tractability and optimization stability. Extensive experiments show that our extended STE framework achieves up to 5.3% accuracy improvement on image classification, 0.72 perplexity reduction on text generation, 2.2$\times$ speedup in training time, and 37.9% lower peak memory usage compared to standard noise-aware training methods.
Related papers
- Boltzmann Reinforcement Learning for Noise resilience in Analog Ising Machines [0.8739101659113154]
We introduce BRAIN (Boltzmann Reinforcement for Analog Ising Networks), a distribution learning framework.<n>By shifting from state-by-state sampling to aggregating information across multiple noisy measurements, BRAIN is resilient to Gaussian noise.<n>BRAIN exhibits $mathcalO(N1.55)$ scaling up to 65,536 spins and maintains robustness against severe measurement uncertainty up to 40%.
arXiv Detail & Related papers (2026-02-09T20:07:42Z) - CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training [73.46600457802693]
We introduce a new method that counteracts the loss induced by quantization.<n>CAGE significantly improves upon the state-of-theart methods in terms of accuracy, for similar computational cost.<n>For QAT pre-training of Llama models, CAGE matches the accuracy achieved at 4-bits (W4A4) with the prior best method.
arXiv Detail & Related papers (2025-10-21T16:33:57Z) - In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning [59.091567092071564]
In-memory training typically requires at least 8-bit conductance states to match digital baselines.<n>Many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints.<n>This paper proposes a emphresidual learning framework that sequentially learns on multiple crossbar tiles to compensate the residual errors.
arXiv Detail & Related papers (2025-10-02T19:44:25Z) - Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models [57.49136894315871]
New paradigm of test-time scaling has yielded remarkable breakthroughs in reasoning models and generative vision models.<n>We propose one solution to the problem of integrating test-time scaling knowledge into a model during post-training.<n>We replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.
arXiv Detail & Related papers (2025-08-13T17:33:37Z) - Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization [33.13911801301048]
Deep neural networks degrade in generalization performance under noisy supervision.<n>Existing methods focus on isolating clean subsets or correcting noisy labels.<n>We propose a novel two-stage noisy learning framework that enables instance-level optimization.
arXiv Detail & Related papers (2025-05-01T19:12:58Z) - Efficient Noise Mitigation for Enhancing Inference Accuracy in DNNs on Mixed-Signal Accelerators [4.416800723562206]
We model process-induced and aging-related variations of analog computing components on the accuracy of the analog neural networks.
We introduce a denoising block inserted between selected layers of a pre-trained model.
We demonstrate that training the denoising block significantly increases the model's robustness against various noise levels.
arXiv Detail & Related papers (2024-09-27T08:45:55Z) - Improved Noise Schedule for Diffusion Training [51.849746576387375]
We propose a novel approach to design the noise schedule for enhancing the training of diffusion models.<n>We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.
arXiv Detail & Related papers (2024-07-03T17:34:55Z) - Taming 3DGS: High-Quality Radiance Fields with Limited Resources [50.92437599516609]
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering.
We tackle the challenges of training and rendering 3DGS models on a budget.
We derive faster, numerically equivalent solutions for gradient computation and attribute updates.
arXiv Detail & Related papers (2024-06-21T20:44:23Z) - Sensitivity-Aware Finetuning for Accuracy Recovery on Deep Learning
Hardware [2.610470075814367]
We introduce the Sensitivity-Aware Finetuning approach that identifies noise sensitive layers in a model, and uses the information to freeze specific layers for noise-injection training.
Our results show that SAFT achieves comparable accuracy to noise-injection training and is 2x to 8x faster.
arXiv Detail & Related papers (2023-06-05T17:52:44Z) - NoisyMix: Boosting Robustness by Combining Data Augmentations, Stability
Training, and Noise Injections [46.745755900939216]
We introduce NoisyMix, a training scheme that combines data augmentations with stability training and noise injections to improve both model robustness and in-domain accuracy.
We demonstrate the benefits of NoisyMix on a range of benchmark datasets, including ImageNet-C, ImageNet-R, and ImageNet-P.
arXiv Detail & Related papers (2022-02-02T19:53:35Z) - Robust Processing-In-Memory Neural Networks via Noise-Aware
Normalization [26.270754571140735]
PIM accelerators often suffer from intrinsic noise in the physical components.
We propose a noise-agnostic method to achieve robust neural network performance against any noise setting.
arXiv Detail & Related papers (2020-07-07T06:51:28Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.