Row-Column Hybrid Grouping for Fault-Resilient Multi-Bit Weight Representation on IMC Arrays
- URL: http://arxiv.org/abs/2508.15685v1
- Date: Thu, 21 Aug 2025 16:05:44 GMT
- Title: Row-Column Hybrid Grouping for Fault-Resilient Multi-Bit Weight Representation on IMC Arrays
- Authors: Kang Eun Jeon, Sangheum Yeon, Jinhee Kim, Hyeonsu Bang, Johnny Rhe, Jong Hwan Ko,
- Abstract summary: This paper addresses the computational unreliability caused by stuck-at faults (SAFs) and the high compilation overhead of fault-mitigation algorithms, namely Fault-Free (FF)<n>We first propose a novel multi-bit weight representation technique, termed row-column hybrid grouping, which generalizes conventional column grouping by introducing redundancy across both rows and columns.<n>Second, we design a compiler that reformulates the fault-aware weight decomposition problem as an analog Linear Programming (ILP) task, enabling fast and scalable compilation through off-the-shelf solvers.
- Score: 8.430588029181136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses two critical challenges in analog In-Memory Computing (IMC) systems that limit their scalability and deployability: the computational unreliability caused by stuck-at faults (SAFs) and the high compilation overhead of existing fault-mitigation algorithms, namely Fault-Free (FF). To overcome these limitations, we first propose a novel multi-bit weight representation technique, termed row-column hybrid grouping, which generalizes conventional column grouping by introducing redundancy across both rows and columns. This structural redundancy enhances fault tolerance and can be effectively combined with existing fault-mitigation solutions. Second, we design a compiler pipeline that reformulates the fault-aware weight decomposition problem as an Integer Linear Programming (ILP) task, enabling fast and scalable compilation through off-the-shelf solvers. Further acceleration is achieved through theoretical insights that identify fault patterns amenable to trivial solutions, significantly reducing computation. Experimental results on convolutional networks and small language models demonstrate the effectiveness of our approach, achieving up to 8%p improvement in accuracy, 150x faster compilation, and 2x energy efficiency gain compared to existing baselines.
Related papers
- Large-scale portfolio optimization on a trapped-ion quantum computer [32.24411362086563]
We present an end-to-end pipeline for large-scale portfolio selection with cardinality constraints.<n>We experimentally demonstrate it on trapped-ion quantum processors using hardware-aware decomposition.
arXiv Detail & Related papers (2026-02-27T12:36:14Z) - FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment [19.973768722251393]
We propose FLoRG, a federated fine-tuning framework which employs a single low-rank matrix for fine-tuning.<n>We show that FLoRG outperforms five state-of-the-art baseline schemes in the downstream task accuracy and can reduce the communication overhead by up to 2041$times$.
arXiv Detail & Related papers (2026-02-19T05:35:23Z) - RaBiT: Residual-Aware Binarization Training for Accurate and Efficient LLMs [5.782015253162346]
Residual binarization enables matmul-free inference by stacking binary layers.<n>We propose RaBiT, a novel quantization framework that resolves coadaptation by algorithmically enforcing a residual hierarchy.<n>RaBiT achieves state-of-the-art performance, rivals even hardware-intensive Vector Quantization (VQ) methods, and delivers a $4.49times$ inference speed-up over full-precision models.
arXiv Detail & Related papers (2026-02-05T06:41:11Z) - Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization [68.89915707647138]
Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains.<n>We propose textbfCoSMo (textbfSplit-textbfMerge textbfOptimization), a framework designed to eliminate structural redundancy rather than indiscriminately restricting token volume.
arXiv Detail & Related papers (2026-02-03T05:54:28Z) - Dependency-Aware Task Offloading in Multi-UAV Assisted Collaborative Mobile Edge Computing [53.88774113545582]
This paper presents a novel multi-unmanned aerial vehicle (UAV) assisted collaborative mobile edge computing (MEC) framework.<n>It aims to minimize the system cost, and thus realize an improved trade-off between task consumption and energy consumption.<n>We show that the proposed scheme can significantly reduce the system cost, and thus realize an improved trade-off between task consumption and energy consumption.
arXiv Detail & Related papers (2025-10-23T02:55:40Z) - PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z) - Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization [3.802887999217352]
State-of-the-art BNSL IP formulations suffer from the exponentially large number of variables and constraints.<n>A standard approach in IP to address such challenges is to employ row and column generation techniques.<n>We show that our row and column generation approach yields solutions with higher quality than state-of-the-art score-based approaches.
arXiv Detail & Related papers (2025-05-16T10:23:19Z) - CLCR: Contrastive Learning-based Constraint Reordering for Efficient MILP Solving [34.127805466651864]
Constraint ordering plays a critical role in the efficiency of Mixed-Integer Linear Programming (MILP) solvers.<n>This paper introduces CLCR (Contrastive Learning-based Constraint Reordering), a novel framework that systematically optimize constraint ordering to accelerate MILP solving.<n> Experiments on benchmarks show CLCR reduces solving time by 30% and LP iterations by 25% on average, without sacrificing solution accuracy.
arXiv Detail & Related papers (2025-03-23T05:01:43Z) - Scalable First-order Method for Certifying Optimal k-Sparse GLMs [9.613635592922174]
We propose a first-order proximal gradient algorithm to solve the perspective relaxation of the problem within a BnB framework.<n>We show that our approach significantly accelerates dual bound computations and is highly effective in providing optimality certificates for large-scale problems.
arXiv Detail & Related papers (2025-02-13T17:14:18Z) - Joint Transmit and Pinching Beamforming for Pinching Antenna Systems (PASS): Optimization-Based or Learning-Based? [89.05848771674773]
A novel antenna system ()-enabled downlink multi-user multiple-input single-output (MISO) framework is proposed.<n>It consists of multiple waveguides, which equip numerous low-cost antennas, named (PAs)<n>The positions of PAs can be reconfigured to both spanning large-scale path and space.
arXiv Detail & Related papers (2025-02-12T18:54:10Z) - Zero-Space Cost Fault Tolerance for Transformer-based Language Models on
ReRAM [27.354689865791638]
Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs)
Hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference.
We propose a fault protection mechanism that incurs zero space cost.
arXiv Detail & Related papers (2024-01-22T02:50:38Z) - CBQ: Cross-Block Quantization for Large Language Models [66.82132832702895]
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.<n>We propose CBQ, a cross-block reconstruction-based PTQ method for LLMs.<n> CBQ employs a cross-block dependency using a reconstruction scheme, establishing long-range dependencies across multiple blocks to minimize error accumulation.
arXiv Detail & Related papers (2023-12-13T07:56:27Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.