HyDRA: Hierarchical and Dynamic Rank Adaptation for Mobile Vision Language Model
- URL: http://arxiv.org/abs/2512.20674v1
- Date: Sat, 20 Dec 2025 10:18:10 GMT
- Title: HyDRA: Hierarchical and Dynamic Rank Adaptation for Mobile Vision Language Model
- Authors: Yuanhao Xi, Xiaohuan Bing, Ramin Yahyapour,
- Abstract summary: HyDRA is a parameter-efficient fine-tuning framework designed to implement hierarchical and dynamic rank scheduling.<n>It consistently outperforms the baseline, achieving a 4.7% improvement across various model sizes without increasing the number of trainable parameters.
- Score: 3.5289584887206313
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision Language Models (VLMs) have undergone significant advancements, particularly with the emergence of mobile-oriented VLMs, which offer a wide range of application scenarios. However, the substantial computational requirements for training these models present a significant obstacle to their practical application. To address this issue, Low-Rank Adaptation (LoRA) has been proposed. Nevertheless, the standard LoRA with a fixed rank lacks sufficient capability for training mobile VLMs that process both text and image modalities. In this work, we introduce HyDRA, a parameter-efficient fine-tuning framework designed to implement hierarchical and dynamic rank scheduling for mobile VLMs. This framework incorporates two essential optimization strategies: (1) hierarchical optimization, which involves a coarse-grained approach that assigns different ranks to various layers, as well as a fine-grained method that adjusts ranks within individual layers, and (2) dynamic adjustment, which employs an end-to-end automatic optimization using a lightweight performance model to determine and adjust ranks during the fine-tuning process. Comprehensive experiments conducted on popular benchmarks demonstrate that HyDRA consistently outperforms the baseline, achieving a 4.7\% improvement across various model sizes without increasing the number of trainable parameters. In some tasks, it even surpasses full-parameter fine-tuning.
Related papers
- MARS: Harmonizing Multimodal Convergence via Adaptive Rank Search [12.345218777941108]
Fine-tuning Multimodal Large Language Models (MLLMs) with parameter-efficient methods like Low-Rank Adaptation (LoRA) is crucial for task adaptation.<n>We introduce MARS (Multimodal Adaptive Rank Search), an approach to discover optimal rank pairs that balance training dynamics while maximizing performance.<n>Our key innovation, a proposed framework of dual scaling laws, enables this search: one law models module-specific convergence time to prune the search space to candidates with aligned dynamics, while the other predicts final task performance to select the optimal pair from the pruned set.
arXiv Detail & Related papers (2026-02-28T15:58:28Z) - OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL [63.388513841293616]
Existing forgery detection methods fail to handle the interleaved text, images, and videos prevalent in real-world misinformation.<n>To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding.<n>We propose textbf OmniVL-Guard, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding.
arXiv Detail & Related papers (2026-02-11T09:41:36Z) - Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems [9.979671028876464]
We propose a novel framework that integrates large language model (LLM) with mathematical optimization in a dynamic hierarchical system.<n>Within this framework, LLM serves as a meta-optimizer, producing semantics that guide a low-level responsible for constraint enforcement and real-time decision execution.<n>Experiments based on both the New York and Chicago taxi datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2025-10-12T14:56:19Z) - DynaSwarm: Dynamically Graph Structure Selection for LLM-based Multi-agent System [0.276240219662896]
DynaSwarm is a dynamic framework that enhances multi-agent systems.<n>It uses an actor-critic reinforcement learning mechanism to optimize graph structures.<n>It also has a dynamic graph selector that adaptively chooses the optimal graph structure for each input sample.
arXiv Detail & Related papers (2025-07-31T05:52:30Z) - Dynamic Context-oriented Decomposition for Task-aware Low-rank Adaptation with Less Forgetting and Faster Convergence [131.41894248194995]
We propose context-oriented decomposition adaptation (CorDA), a novel method that initializes adapters in a task-aware manner.<n>Thanks to the task awareness, our method enables two optional adaptation modes, knowledge-preserved mode (KPM) and instruction-previewed mode (IPM)
arXiv Detail & Related papers (2025-06-16T07:55:14Z) - Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models [48.15777554876988]
Traditional alignment methods often require retraining large pretrained models.<n>We propose a novel textitResidual Alignment Model (textitRAM) that formalizes the alignment process as a type of importance sampling.<n>We develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods.
arXiv Detail & Related papers (2025-05-26T08:53:02Z) - MetaLoRA: Tensor-Enhanced Adaptive Low-Rank Fine-tuning [23.735592086378194]
Low-Rank Adaptation (LoRA) has emerged as a promising parameter-efficient fine-tuning method.<n>Current LoRA variants primarily focus on general parameter reduction while overlooking the importance of dynamic parameter adjustment and meta-learning capabilities.<n>This research proposes a LoRA generation approach to model task relationships and introduces MetaLoRA, a novel parameter-efficient adaptation framework.
arXiv Detail & Related papers (2025-04-01T06:34:26Z) - Dynamic Adaptation of LoRA Fine-Tuning for Efficient and Task-Specific Optimization of Large Language Models [0.7421845364041001]
This paper presents a novel methodology of fine-tuning for large language models-dynamic LoRA.<n>It adds dynamic adaptation mechanisms to improve efficiency and performance.<n>The efficiency of the dynamic LoRA was validated in experiments on benchmark datasets.
arXiv Detail & Related papers (2025-01-24T18:54:14Z) - HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.