Related papers: FreeAct: Freeing Activations for LLM Quantization

FreeAct: Freeing Activations for LLM Quantization

URL: http://arxiv.org/abs/2603.01776v2
Date: Thu, 05 Mar 2026 01:02:36 GMT
Title: FreeAct: Freeing Activations for LLM Quantization
Authors: Xiaohao Liu, Xiaobo Xia, Manyi Zhang, Ji-Fu Li, Xianzhi Yu, Fei Shen, Xiu Su, See-Kiong Ng, Tat-Seng Chua,
Abstract summary: Quantization is pivotal for mitigating the significant memory and computational overhead of Large Language Models.<n>FreeAct is a novel quantization framework that relaxes the static one-to-one constraint to accommodate dynamic activation disparities.<n>Experiments across dLLMs and MLLMs demonstrate that FreeAct significantly outperforms baselines, up to 5.3% performance improvement.
Score: 89.97086263978058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Quantization is pivotal for mitigating the significant memory and computational overhead of Large Language Models (LLMs). While emerging transformation-based methods have successfully enhanced quantization by projecting feature spaces onto smoother manifolds using orthogonal matrices, they typically enforce a rigid one-to-one transformation constraint. This static approach fails to account for the dynamic patterns inherent in input activations, particularly within diffusion LLMs (dLLMs) and Multimodal LLMs (MLLMs), where varying token types exhibit distinct distributions. To advance this, we propose FreeAct, a novel quantization framework that relaxes the static one-to-one constraint to accommodate dynamic activation disparities. Theoretically, we leverage the rank-deficient nature of activations to derive a solution space that extends beyond simple inverse matrices, enabling the decoupling of activation transformations from weights. Methodologically, FreeAct identifies token-specific dynamics (i.e., vision v.s. text, or masked tokens) and allocates distinct transformation matrices to the activation side, while maintaining a unified, static transformation for the weights. Extensive experiments across dLLMs and MLLMs demonstrate that FreeAct significantly outperforms baselines, up to 5.3% performance improvement, with in-depth analyses. Our code will be publicly released.

Related papers

GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models [23.159388800893964]
We argue that alignment is most effective when both modalities share a unified geometric basis.<n>We employ a decoder-only quantizer with Gumbel-Softmax for differentiable training and balanced codebook usage.<n>Our framework achieves a 20% performance improvement over current state-of-the-art methods.
arXiv Detail & Related papers (2026-01-12T15:14:29Z)
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization [21.93314755695813]
Quantization is the key method for reducing inference latency, power and memory footprint of generative AI models.<n>We propose textitSequence Transformation and Mixed Precision (STaMP) quantization.
arXiv Detail & Related papers (2025-10-30T17:53:42Z)
Meaningless Tokens, Meaningful Gains: How Activation Shifts Enhance LLM Reasoning [53.35553353785948]
Motivated by the puzzling observation that inserting long sequences of meaningless tokens before the query prompt can consistently enhance reasoning LLM performance, this work analyzes the underlying mechanism driving this phenomenon.<n>We find that the improvements arise from a redistribution of activations in the LLM's layers, where near zero activations become less frequent while large magnitude activations increase.<n>We propose a lightweight inference-time technique that modifies activations directly without altering the input sequence.
arXiv Detail & Related papers (2025-10-01T15:39:38Z)
MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing [53.98607267063729]
MotionVerse is a framework to comprehend, generate, and edit human motion in both single-person and multi-person scenarios.<n>We employ a motion tokenizer with residual quantization, which converts continuous motion sequences into multi-stream discrete tokens.<n>We also introduce a textitDelay Parallel Modeling strategy, which temporally staggers the encoding of residual token streams.
arXiv Detail & Related papers (2025-09-28T04:20:56Z)
Large Language Model Compression via the Nested Activation-Aware Decomposition [12.400791399764213]
We introduce a novel post-training compression paradigm that focuses on low-rank decomposition of large language models (LLMs)<n>We propose a nested activation-aware framework (NSVD) for LLMs, a training-free approach designed to enhance the accuracy of low-rank decompositions.
arXiv Detail & Related papers (2025-03-21T12:39:16Z)
Transformer-Squared: Self-adaptive LLMs [29.1326358746118]
We introduce Transformer-Squared, a novel self-adaptation framework that adapts large language models for unseen tasks in real-time.<n>Our method consistently outperforms ubiquitous approaches such as LoRA, with fewer parameters and greater efficiency.<n> Transformer-Squared represents a significant leap forward, offering a scalable, efficient solution for enhancing the adaptability and task-specific performance of LLMs.
arXiv Detail & Related papers (2025-01-09T01:19:21Z)
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering [56.710375516257876]
We propose to map hidden states to interpretable visual and textual concepts.<n>This enables us to more efficiently compare certain semantic dynamics, such as the shift from an original and fine-tuned model.<n>We also demonstrate the use of shift vectors to capture these concepts changes.
arXiv Detail & Related papers (2025-01-06T13:37:13Z)
Data-free Weight Compress and Denoise for Large Language Models [96.68582094536032]
We propose a novel approach termed Data-free Joint Rank-k Approximation for compressing the parameter matrices.<n>We achieve a model pruning of 80% parameters while retaining 93.43% of the original performance without any calibration data.
arXiv Detail & Related papers (2024-02-26T05:51:47Z)
ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models [49.970828419830355]
We introduce a new post-training compression paradigm for Large Language Models (LLMs)<n>We propose a training-free approach called Activation-aware Singular Value Decomposition (ASVD)
arXiv Detail & Related papers (2023-12-10T08:41:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.