MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models
- URL: http://arxiv.org/abs/2603.04800v1
- Date: Thu, 05 Mar 2026 04:41:32 GMT
- Title: MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models
- Authors: Lulu Hu, Wenhu Xiao, Xin Chen, Xinhua Xu, Bowen Xu, Kun Li, Yongliang Tao,
- Abstract summary: Modality-Aware Smoothing Quantization (MASQuant) is a novel framework that introduces Modality-Aware Smoothing (MAS)<n>MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs.<n> Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms.
- Score: 16.69997403621672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Post-training quantization (PTQ) with computational invariance for Large Language Models~(LLMs) have demonstrated remarkable advances, however, their application to Multimodal Large Language Models~(MLLMs) presents substantial challenges. In this paper, we analyze SmoothQuant as a case study and identify two critical issues: Smoothing Misalignment and Cross-Modal Computational Invariance. To address these issues, we propose Modality-Aware Smoothing Quantization (MASQuant), a novel framework that introduces (1) Modality-Aware Smoothing (MAS), which learns separate, modality-specific smoothing factors to prevent Smoothing Misalignment, and (2) Cross-Modal Compensation (CMC), which addresses Cross-modal Computational Invariance by using SVD whitening to transform multi-modal activation differences into low-rank forms, enabling unified quantization across modalities. MASQuant demonstrates stable quantization performance across both dual-modal and tri-modal MLLMs. Experimental results show that MASQuant is competitive among the state-of-the-art PTQ algorithms. Source code: https://github.com/alibaba/EfficientAI.
Related papers
- Mitigating Modality Imbalance in Multi-modal Learning via Multi-objective Optimization [57.00656508727821]
Multi-modal learning (MML) aims to integrate information from multiple modalities, which is expected to lead to superior performance over single-modality learning.<n>Recent studies have shown that MML can underperform, even compared to single-modality approaches, due to imbalanced learning across modalities.<n>We propose a gradient-based algorithm to solve the modified MML problem.
arXiv Detail & Related papers (2025-11-10T04:16:01Z) - SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality [52.948791050405525]
We propose SimMLM, a simple yet powerful framework for multimodal learning with missing modalities.<n>SimMLM consists of a generic Dynamic Mixture of Modality Experts (DMoME) architecture, featuring a dynamic, learnable gating mechanism.<n>Key innovation of SimMLM is the proposed More vs. Fewer (MoFe) ranking loss, which ensures that task accuracy improves or remains stable as more modalities are made available.
arXiv Detail & Related papers (2025-07-25T13:39:34Z) - Dynamic Modality Scheduling for Multimodal Large Models via Confidence, Uncertainty, and Semantic Consistency [0.0]
We propose Dynamic Modality Scheduling (DMS), a novel framework that adaptively adjusts the contribution of each modality at a per-sample level.<n> Experimental results on VQA, image-text retrieval, and captioning tasks show that DMS significantly improves both clean and robust performance.
arXiv Detail & Related papers (2025-06-15T05:15:52Z) - TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models [23.916205754112774]
Multimodal Large Language Models (MLLMs) have shown remarkable versatility in understanding diverse multimodal data and tasks.<n>We propose TAMP, a simple yet effective pruning framework tailored for MLLMs.<n>We validate our method on two state-of-the-art MLLMs: LLaVA-NeXT, designed for vision-language tasks, and VideoLLaMA2, capable of processing audio, visual, and language modalities.
arXiv Detail & Related papers (2025-04-14T05:44:38Z) - MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization [15.01214559812713]
MQuant is a post-training quantization framework designed to tackle the challenges of multimodal large language models (MLLMs)<n>On five mainstream MLLMs, MQuant under W4A8 achieves near-floating-point accuracy while reducing inference latency by up to 30%.<n>Our MQuant effectively bridges the gap for efficient and accurate MLLMs inference in resource-constrained devices.
arXiv Detail & Related papers (2025-02-01T13:08:02Z) - GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis [0.0]
Multimodal Sentiment Analysis (MSA) leverages multiple data modals to analyze human sentiment.<n>Existing MSA models generally employ cutting-edge multimodal fusion and representation learning-based methods to promote MSA capability.<n>Our proposed GSIFN incorporates two main components to solve these problems: (i) a graph-structured and interlaced-masked multimodal Transformer.<n>It adopts the Interlaced Mask mechanism to construct robust multimodal graph embedding, achieve all-modal-in-one Transformer-based fusion, and greatly reduce the computational overhead.
arXiv Detail & Related papers (2024-08-27T06:44:28Z) - Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation [70.22782550540714]
Quantization-aware Scale LeArning method based on multimodal Warmup, termed QSLAW.
We introduce a Quantization-aware Scale LeArning method based on multimodal Warmup, termed QSLAW.
arXiv Detail & Related papers (2024-08-07T12:42:09Z) - Model Composition for Multimodal Large Language Models [71.5729418523411]
We propose a new paradigm through the model composition of existing MLLMs to create a new model that retains the modal understanding capabilities of each original model.
Our basic implementation, NaiveMC, demonstrates the effectiveness of this paradigm by reusing modality encoders and merging LLM parameters.
arXiv Detail & Related papers (2024-02-20T06:38:10Z) - Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large
Language Models [56.256069117502385]
Chain of Thought (CoT) approaches can be used to enhance the capability of Large Language Models (LLMs) on complex reasoning tasks.
However, the selection of optimal CoT demonstration examples in multi-modal reasoning remains less explored.
We introduce a novel approach that addresses this challenge by using retrieval mechanisms to automatically select demonstration examples.
arXiv Detail & Related papers (2023-12-04T08:07:21Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.