Alignment Adapter to Improve the Performance of Compressed Deep Learning Models
- URL: http://arxiv.org/abs/2602.14635v1
- Date: Mon, 16 Feb 2026 10:53:02 GMT
- Title: Alignment Adapter to Improve the Performance of Compressed Deep Learning Models
- Authors: Rohit Raj Rai, Abhishek Dhaka, Amit Awekar,
- Abstract summary: Alignment Adapter (AlAd) is a lightweight, sliding-window-based adapter.<n>It aligns the token-level embeddings of a compressed model with those of the original large model.<n>AlAd can be deployed in two ways: as a plug-and-play module over a frozen compressed model, or by jointly fine-tuning AlAd with the compressed model for further performance gains.
- Score: 1.1087735229999816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compressed Deep Learning (DL) models are essential for deployment in resource-constrained environments. But their performance often lags behind their large-scale counterparts. To bridge this gap, we propose Alignment Adapter (AlAd): a lightweight, sliding-window-based adapter. It aligns the token-level embeddings of a compressed model with those of the original large model. AlAd preserves local contextual semantics, enables flexible alignment across differing dimensionalities or architectures, and is entirely agnostic to the underlying compression method. AlAd can be deployed in two ways: as a plug-and-play module over a frozen compressed model, or by jointly fine-tuning AlAd with the compressed model for further performance gains. Through experiments on BERT-family models across three token-level NLP tasks, we demonstrate that AlAd significantly boosts the performance of compressed models with only marginal overhead in size and latency.
Related papers
- Arbitrary Ratio Feature Compression via Next Token Prediction [52.10426317889982]
Arbitrary Ratio Feature Compression (ARFC) framework supports any compression ratio with a single model.<n>ARC is an auto-regressive model that performs compression via next-gressive prediction.<n>MoS module refines the compressed tokens by utilizing multiple compression results.<n>ERGC is integrated into the training process to preserve semantic and structural relationships during compression.
arXiv Detail & Related papers (2026-02-12T02:38:57Z) - Proxy Compression for Language Modeling [58.904023114033954]
proxy compression is an alternative training scheme that preserves the efficiency benefits of compressed inputs.<n>Experiments on code language modeling demonstrate that proxy compression substantially improves training efficiency.<n>As model scale increases, proxy-trained models eventually match or rival tokenizer approaches.
arXiv Detail & Related papers (2026-02-04T07:36:46Z) - MEGA-PCC: A Mamba-based Efficient Approach for Joint Geometry and Attribute Point Cloud Compression [9.422873276112067]
MEGA-PCC is a fully end-to-end, learning-based framework featuring two specialized models for joint compression.<n>It achieves superior rate-distortion performance and runtime efficiency compared to both traditional and learning-based baselines.
arXiv Detail & Related papers (2025-12-27T04:43:36Z) - Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z) - Re-Densification Meets Cross-Scale Propagation: Real-Time Neural Compression of LiDAR Point Clouds [83.39320394656855]
LiDAR point clouds are fundamental to various applications, yet high-precision scans incur substantial storage and transmission overhead.<n>Existing methods typically convert unordered points into hierarchical octree or voxel structures for dense-to-sparse predictive coding.<n>Our framework comprises two lightweight modules. First, the Geometry Re-Densification Module re-densifies encoded sparse geometry, extracts features at denser scale, and then re-sparsifies the features for predictive coding.
arXiv Detail & Related papers (2025-08-28T06:36:10Z) - CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage [14.170572219170158]
We propose CompLeak, the first privacy risk framework examining three widely used compression configurations.<n> CompLeak has three variants, given available access to the number of compressed models and original model.
arXiv Detail & Related papers (2025-07-22T08:02:46Z) - Towards Compatible Fine-tuning for Vision-Language Model Updates [114.25776195225494]
Class-conditioned Context Optimization (ContCoOp) integrates learnable prompts with class embeddings using an attention layer before inputting them into the text encoder.<n>Our experiments over 15 datasets show that our ContCoOp achieves the highest compatibility over the baseline methods, and exhibits robust out-of-distribution generalization.
arXiv Detail & Related papers (2024-12-30T12:06:27Z) - Application Specific Compression of Deep Learning Models [0.8875650122536799]
Large Deep Learning models are compressed and deployed for specific applications.
Our goal is to customize the model compression process to create a compressed model that will perform better for the target application.
We have experimented with the BERT family of models for three applications: Extractive QA, Natural Language Inference, and Paraphrase Identification.
arXiv Detail & Related papers (2024-09-09T06:55:38Z) - Lightweight Attribute Localizing Models for Pedestrian Attribute Recognition [13.480231032159834]
We propose a novel approach for determining the optimal ranks of low-rank layers, ensuring that the gradient direction of the compressed model closely aligns with that of the original model.<n>This means that the compressed model effectively preserves the update direction of the full model, enabling more efficient compression for Pedestrian Attribute Recognition tasks.
arXiv Detail & Related papers (2023-06-16T13:07:13Z) - Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM
Inference with Transferable Prompt [96.24800696597707]
We introduce a new perspective to optimize this trade-off by prompting compressed models.
We propose a soft prompt learning method where we expose the compressed model to the prompt learning process.
Our experimental analysis suggests our soft prompt strategy greatly improves the performance of the 8x compressed LLaMA-7B model.
arXiv Detail & Related papers (2023-05-17T20:45:13Z) - Sparsity-guided Network Design for Frame Interpolation [39.828644638174225]
We present a compression-driven network design for frame-based algorithms.
We leverage model pruning through sparsity-inducing optimization to greatly reduce the model size.
We achieve a considerable performance gain with a quarter of the size of the original AdaCoF.
arXiv Detail & Related papers (2022-09-09T23:13:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.