Related papers: LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge

LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge

URL: http://arxiv.org/abs/2602.07849v3
Date: Tue, 17 Feb 2026 11:38:26 GMT
Title: LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge
Authors: Xin Wang, Hong Jia, Hualin Zhou, Sheng Guang Wang, Yu Zhang, Ting Dang, Tao Gu,
Abstract summary: We propose a lightweight, quantized-adaptive framework for Vision-Language Models (VLMs)<n>We introduce Selective Hybrid Quantization (SHQ) and a quantized, gradient-free adaptation mechanism to enable robust and efficient VLM deployment on resource-constrained hardware.<n> Experiments show that LQA improves overall adaptation performance by 4.5%, uses less memory, and significantly outperforms gradient-based TTA methods.
Score: 12.772499009055194
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deploying Vision-Language Models (VLMs) on edge devices is challenged by resource constraints and performance degradation under distribution shifts. While test-time adaptation (TTA) can counteract such shifts, existing methods are too resource-intensive for on-device deployment. To address this challenge, we propose LQA, a lightweight, quantized-adaptive framework for VLMs that combines a modality-aware quantization strategy with gradient-free test-time adaptation. We introduce Selective Hybrid Quantization (SHQ) and a quantized, gradient-free adaptation mechanism to enable robust and efficient VLM deployment on resource-constrained hardware. Experiments across both synthetic and real-world distribution shifts show that LQA improves overall adaptation performance by 4.5\%, uses less memory than full-precision models, and significantly outperforms gradient-based TTA methods, achieving up to 19.9$\times$ lower memory usage across seven open-source datasets. These results demonstrate that LQA offers a practical pathway for robust, privacy-preserving, and efficient VLM deployment on edge devices.

Related papers

AIVD: Adaptive Edge-Cloud Collaboration for Accurate and Efficient Industrial Visual Detection [15.419663374345845]
This paper proposes the AIVD framework, which achieves unified precise localization and high-quality semantic generation.<n>To enhance the cloud MLLM's robustness against edge cropped-box noise and scenario variations, we design an efficient fine-tuning strategy.<n>To maintain high throughput and low latency across heterogeneous edge devices and dynamic network conditions, we propose a heterogeneous resource-aware dynamic scheduling algorithm.
arXiv Detail & Related papers (2026-01-08T08:56:07Z)
Efficient Edge Test-Time Adaptation via Latent Feature Coordinate Correction [43.48832321879385]
We propose a novel test-time adaptation (TTA) method tailored for edge devices (TED)<n>TED employs forward-only coordinate optimization in the principal subspace of latent using the covariance matrix adaptation evolution strategy (CMA-ES)<n>TED achieves state-of-the-art performance while $textitreducing computational complexity by up to 63 times$, offering a practical and scalable solution for real-world edge applications.
arXiv Detail & Related papers (2025-10-13T07:08:52Z)
Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization [61.55616421408666]
Low-Altitude Economy Networks (LAENets) have enabled a variety of applications, including aerial surveillance, environmental sensing, and semantic data collection.<n> onboard vision (VLMs) offer inference for real-time inference but limited onboard dynamic network conditions.<n>We propose a UAV-enabled LAENet system that improves communication efficiency under dynamic LAENet conditions.
arXiv Detail & Related papers (2025-10-11T05:11:21Z)
Small Aid, Big Leap: Efficient Test-Time Adaptation for Vision-Language Models with AdaptNet [5.977269026037707]
Test-time adaptation (TTA) has emerged as a critical technique for enhancing the generalization capability of vision-language models (VLMs) during inference.<n>We introduce SAIL, a novel adapter-based TTA framework that leverages a lightweight, learnable AdaptNet to enable efficient and scalable model adaptation.
arXiv Detail & Related papers (2025-06-03T09:16:51Z)
LeanTTA: A Backpropagation-Free and Stateless Approach to Quantized Test-Time Adaptation on Edge Devices [13.355021314836852]
We present LeanTTA, a novel backpropagation-free and stateless framework for quantized test-time adaptation tailored to edge devices.<n>Our approach minimizes computational costs by dynamically updating normalization statistics without backpropagation.<n>We validate our framework across sensor modalities, demonstrating significant improvements over state-of-the-art TTA methods.
arXiv Detail & Related papers (2025-03-20T06:27:09Z)
Network Resource Optimization for ML-Based UAV Condition Monitoring with Vibration Analysis [54.550658461477106]
Condition Monitoring (CM) uses Machine Learning (ML) models to identify abnormal and adverse conditions.<n>This work explores the optimization of network resources for ML-based UAV CM frameworks.<n>By leveraging dimensionality reduction techniques, there is a 99.9% reduction in network resource consumption.
arXiv Detail & Related papers (2025-02-21T14:36:12Z)
LSAQ: Layer-Specific Adaptive Quantization for Large Language Model Deployment [12.80921403367322]
Large Language Models (LLMs) demonstrate exceptional performance across various domains.<n> Quantization techniques, which reduce the size and memory requirements of LLMs, are effective for deploying LLMs on resource-limited edge devices.<n>We propose Layer-Specific Adaptive Quantization (LSAQ), a system for adaptive quantization and dynamic deployment of LLMs based on layer importance.
arXiv Detail & Related papers (2024-12-24T03:43:15Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
HAFLQ: Heterogeneous Adaptive Federated LoRA Fine-tuned LLM with Quantization [55.972018549438964]
Federated fine-tuning of pre-trained Large Language Models (LLMs) enables task-specific adaptation across diverse datasets while preserving privacy.<n>We propose HAFLQ (Heterogeneous Adaptive Federated Low-Rank Adaptation Fine-tuned LLM with Quantization), a novel framework for efficient and scalable fine-tuning of LLMs in heterogeneous environments.<n> Experimental results on the text classification task demonstrate that HAFLQ reduces memory usage by 31%, lowers communication cost by 49%, improves accuracy by 50%, and achieves faster convergence compared to the baseline method.
arXiv Detail & Related papers (2024-11-10T19:59:54Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.