Related papers: HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware

HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware

URL: http://arxiv.org/abs/2508.11935v1
Date: Sat, 16 Aug 2025 06:34:14 GMT
Title: HPD: Hybrid Projection Decomposition for Robust State Space Models on Analog CIM Hardware
Authors: Yuannuo Feng, Wenyong Zhou, Yuexi Lyu, Hanjie Liu, Zhengwu Liu, Ngai Wong, Wang Kang,
Abstract summary: State Space Models (SSMs) are efficient alternatives to traditional sequence models.<n>Their reliance on matrix multiplications makes them ideal for compute-in-memory (CIM) architectures.<n>We propose HPD, a Hybrid Projection Decomposition strategy for the last output projection layer.
Score: 4.727184737671133
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State Space Models (SSMs) are efficient alternatives to traditional sequence models, excelling at processing long sequences with lower computational complexity. Their reliance on matrix multiplications makes them ideal for compute-in-memory (CIM) architectures, which improve energy efficiency by computing within memory arrays. However, device non-idealities in CIM introduce weight perturbations that can degrade inference accuracy. In this paper, we systematically analyze the robustness of SSMs under noisy conditions, identifying that the final block and output projection layers are more susceptible to perturbations compared to other components. Building on these insights, we propose HPD, a Hybrid Projection Decomposition strategy for the last output projection layer. We replace the original weight matrix with the multiplication of U and {\Sigma} in its SVD to ensure compatibility with existing hardware architectures, while offloading V> to digital hardware for precise and robust correction. Comprehensive tests on Mamba models show that our method reduces perplexity by up to 99.57% under various noise conditions compared to baseline models, with accuracy gains of up to 96.67% on the PIQA benchmark for commonsense reasoning.

Related papers

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation [57.57816409869894]
We introduce POET-X, a scalable and memory-efficient variant for training large language models.<n>PoET-X maintains the generalization and stability benefits of POET while achieving substantial improvements in throughput and memory efficiency.
arXiv Detail & Related papers (2026-03-05T18:59:23Z)
Element-wise Modulation of Random Matrices for Efficient Neural Layers [0.0]
We propose a novel approach that decouples feature mixing from adaptation by utilizing a fixed random matrix modulated by lightweight, learnable element-wise parameters.<n>This architecture drastically reduces the trainable parameter count to a linear scale while retaining reliable accuracy across various benchmarks.
arXiv Detail & Related papers (2025-12-15T16:16:53Z)
The Curious Case of In-Training Compression of State Space Models [49.819321766705514]
State Space Models (SSMs) tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference.<n>Key design challenge is striking the right balance between maximizing expressivity and limiting this computational burden.<n>Our approach, textscCompreSSM, applies to Linear Time-Invariant SSMs such as Linear Recurrent Units, but is also extendable to selective models.
arXiv Detail & Related papers (2025-10-03T09:02:33Z)
A Further Comparison of TD-DMRG and ML-MCTDH for Nonadiabatic Dynamics of Exciton Dissociation [2.0480965608306305]
A recent study reported up to 60% discrepancy in their calculations for excitonconfiguration.<n>By revisiting the benchmark P3HT:PCBM heterojunction model, we show that the observed discrepancies arise primarily from insufficient bond dimensions.<n>Our results confirm both methods converge to numerically exact solutions when bond dimensions are adequately scaled.
arXiv Detail & Related papers (2025-08-30T10:53:51Z)
QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models [0.8474310104568011]
Structured State Space models (SSM) have emerged as a new class of deep learning models.<n>QAT can significantly reduce the complexity of SSMs by up to two orders of magnitude across various performance metrics.<n>We show that QAT enhances robustness to analog noise and enables structural pruning.
arXiv Detail & Related papers (2025-07-08T15:19:14Z)
Mix-QSAM: Mixed-Precision Quantization of the Segment Anything Model [0.0]
Mix-QSAM is a mixed-precision Post-Training Quantization (PTQ) framework for the Segment Anything Model (SAM)<n>We introduce a layer-wise importance score, derived using Kullback-Leibler (KL) divergence, to quantify each layer's contribution to the model's output.<n>We also introduce cross-layer synergy, a novel metric based on causal mutual information, to capture dependencies between adjacent layers.
arXiv Detail & Related papers (2025-05-08T00:08:31Z)
Hybrid machine learning based scale bridging framework for permeability prediction of fibrous structures [0.0]
This study introduces a hybrid machine learning-based scale-bridging framework for predicting the permeability of fibrous textile structures.<n>Four methodologies were evaluated: Single Scale Method (SSM), Simple Upscaling Method (SUM), Scale-Bridging Method (SBM), and Fully Resolved Model (FRM)
arXiv Detail & Related papers (2025-02-07T16:09:25Z)
PearSAN: A Machine Learning Method for Inverse Design using Pearson Correlated Surrogate Annealing [66.27103948750306]
PearSAN is a machine learning-assisted optimization algorithm applicable to inverse design problems with large design spaces.<n>It uses a Pearson correlated surrogate model to predict the figure of merit of the true design metric.<n>It achieves a state-of-the-art maximum design efficiency of 97%, and is at least an order of magnitude faster than previous methods.
arXiv Detail & Related papers (2024-12-26T17:02:19Z)
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay [20.688382669309096]
p-MoD is an efficient MLLM architecture that significantly reduces training and inference costs while maintaining model performance.<n>We adapt the MoD module with two novel designs: tanh-gated weight normalization (TanhNorm) and symmetric token reweighting (STRing)
arXiv Detail & Related papers (2024-12-05T18:58:03Z)
Synergistic Development of Perovskite Memristors and Algorithms for Robust Analog Computing [53.77822620185878]
We propose a synergistic methodology to concurrently optimize perovskite memristor fabrication and develop robust analog DNNs.<n>We develop "BayesMulti", a training strategy utilizing BO-guided noise injection to improve the resistance of analog DNNs to memristor imperfections.<n>Our integrated approach enables use of analog computing in much deeper and wider networks, achieving up to 100-fold improvements.
arXiv Detail & Related papers (2024-12-03T19:20:08Z)
Data-free Weight Compress and Denoise for Large Language Models [96.68582094536032]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.<n>We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.