LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge
- URL: http://arxiv.org/abs/2602.07849v3
- Date: Tue, 17 Feb 2026 11:38:26 GMT
- Title: LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge
- Authors: Xin Wang, Hong Jia, Hualin Zhou, Sheng Guang Wang, Yu Zhang, Ting Dang, Tao Gu,
- Abstract summary: We propose a lightweight, quantized-adaptive framework for Vision-Language Models (VLMs)<n>We introduce Selective Hybrid Quantization (SHQ) and a quantized, gradient-free adaptation mechanism to enable robust and efficient VLM deployment on resource-constrained hardware.<n> Experiments show that LQA improves overall adaptation performance by 4.5%, uses less memory, and significantly outperforms gradient-based TTA methods.
- Score: 12.772499009055194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deploying Vision-Language Models (VLMs) on edge devices is challenged by resource constraints and performance degradation under distribution shifts. While test-time adaptation (TTA) can counteract such shifts, existing methods are too resource-intensive for on-device deployment. To address this challenge, we propose LQA, a lightweight, quantized-adaptive framework for VLMs that combines a modality-aware quantization strategy with gradient-free test-time adaptation. We introduce Selective Hybrid Quantization (SHQ) and a quantized, gradient-free adaptation mechanism to enable robust and efficient VLM deployment on resource-constrained hardware. Experiments across both synthetic and real-world distribution shifts show that LQA improves overall adaptation performance by 4.5\%, uses less memory than full-precision models, and significantly outperforms gradient-based TTA methods, achieving up to 19.9$\times$ lower memory usage across seven open-source datasets. These results demonstrate that LQA offers a practical pathway for robust, privacy-preserving, and efficient VLM deployment on edge devices.
Related papers
- AIVD: Adaptive Edge-Cloud Collaboration for Accurate and Efficient Industrial Visual Detection [15.419663374345845]
This paper proposes the AIVD framework, which achieves unified precise localization and high-quality semantic generation.<n>To enhance the cloud MLLM's robustness against edge cropped-box noise and scenario variations, we design an efficient fine-tuning strategy.<n>To maintain high throughput and low latency across heterogeneous edge devices and dynamic network conditions, we propose a heterogeneous resource-aware dynamic scheduling algorithm.
arXiv Detail & Related papers (2026-01-08T08:56:07Z) - Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction [43.48832321879385]
We propose a novel test-time adaptation (TTA) method tailored for edge devices (TED)<n>TED employs forward-only coordinate optimization in the principal subspace of latent using the covariance matrix adaptation evolution strategy (CMA-ES)<n>TED achieves state-of-the-art performance while $textitreducing computational complexity by up to 63 times$, offering a practical and scalable solution for real-world edge applications.
arXiv Detail & Related papers (2025-10-13T07:08:52Z) - Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization [61.55616421408666]
Low-Altitude Economy Networks (LAENets) have enabled a variety of applications, including aerial surveillance, environmental sensing, and semantic data collection.<n> onboard vision (VLMs) offer inference for real-time inference but limited onboard dynamic network conditions.<n>We propose a UAV-enabled LAENet system that improves communication efficiency under dynamic LAENet conditions.
arXiv Detail & Related papers (2025-10-11T05:11:21Z) - Small Aid, Big Leap: Efficient Test-Time Adaptation for Vision-Language Models with AdaptNet [5.977269026037707]
Test-time adaptation (TTA) has emerged as a critical technique for enhancing the generalization capability of vision-language models (VLMs) during inference.<n>We introduce SAIL, a novel adapter-based TTA framework that leverages a lightweight, learnable AdaptNet to enable efficient and scalable model adaptation.
arXiv Detail & Related papers (2025-06-03T09:16:51Z) - LeanTTA: A Backpropagation-Free and Stateless Approach to Quantized Test-Time Adaptation on Edge Devices [13.355021314836852]
We present LeanTTA, a novel backpropagation-free and stateless framework for quantized test-time adaptation tailored to edge devices.<n>Our approach minimizes computational costs by dynamically updating normalization statistics without backpropagation.<n>We validate our framework across sensor modalities, demonstrating significant improvements over state-of-the-art TTA methods.
arXiv Detail & Related papers (2025-03-20T06:27:09Z) - Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration Analysis [54.550658461477106]
Condition Monitoring (CM) uses Machine Learning (ML) models to identify abnormal and adverse conditions.<n>This work explores the optimization of network resources for ML-based UAV CM frameworks.<n>By leveraging dimensionality reduction techniques, there is a 99.9% reduction in network resource consumption.
arXiv Detail & Related papers (2025-02-21T14:36:12Z) - LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment [12.80921403367322]
Large Language Models (LLMs) demonstrate exceptional performance across various domains.<n> Quantization techniques, which reduce the size and memory requirements of LLMs, are effective for deploying LLMs on resource-limited edge devices.<n>We propose Layer-Specific Adaptive Quantization (LSAQ), a system for adaptive quantization and dynamic deployment of LLMs based on layer importance.
arXiv Detail & Related papers (2024-12-24T03:43:15Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - HAFLQ: Heterogeneous Adaptive Federated LoRA Fine-tuned LLM with Quantization [55.972018549438964]
Federated fine-tuning of pre-trained Large Language Models (LLMs) enables task-specific adaptation across diverse datasets while preserving privacy.<n>We propose HAFLQ (Heterogeneous Adaptive Federated Low-Rank Adaptation Fine-tuned LLM with Quantization), a novel framework for efficient and scalable fine-tuning of LLMs in heterogeneous environments.<n> Experimental results on the text classification task demonstrate that HAFLQ reduces memory usage by 31%, lowers communication cost by 49%, improves accuracy by 50%, and achieves faster convergence compared to the baseline method.
arXiv Detail & Related papers (2024-11-10T19:59:54Z) - AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community.
We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.