LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models
- URL: http://arxiv.org/abs/2502.02069v1
- Date: Tue, 04 Feb 2025 07:40:26 GMT
- Title: LoRA-TTT: Low-Rank Test-Time Training for Vision-Language Models
- Authors: Yuto Kojima, Jiarui Xu, Xueyan Zou, Xiaolong Wang,
- Abstract summary: We propose LoRA-TTT, a novel Test-Time Training (TTT) method for vision-language models (VLMs)
By introducing LoRA and updating only its parameters during test time, our method offers a simple yet effective TTT approach.
Our method can adapt to diverse domains by combining these two losses, without increasing memory consumption or runtime.
- Score: 23.218237408724676
- License:
- Abstract: The rapid advancements in vision-language models (VLMs), such as CLIP, have intensified the need to address distribution shifts between training and testing datasets. Although prior Test-Time Training (TTT) techniques for VLMs have demonstrated robust performance, they predominantly rely on tuning text prompts, a process that demands substantial computational resources and is heavily dependent on entropy-based loss. In this paper, we propose LoRA-TTT, a novel TTT method that leverages Low-Rank Adaptation (LoRA), applied exclusively to the image encoder of VLMs. By introducing LoRA and updating only its parameters during test time, our method offers a simple yet effective TTT approach, retaining the model's initial generalization capability while achieving substantial performance gains with minimal memory and runtime overhead. Additionally, we introduce a highly efficient reconstruction loss tailored for TTT. Our method can adapt to diverse domains by combining these two losses, without increasing memory consumption or runtime. Extensive experiments on two benchmarks, covering 15 datasets, demonstrate that our method improves the zero-shot top-1 accuracy of CLIP-ViT-B/16 by an average of 5.79% on the OOD benchmark and 1.36% on the fine-grained benchmark, efficiently surpassing test-time prompt tuning, without relying on any external models or cache.
Related papers
- Fast T2T: Optimization Consistency Speeds Up Diffusion-Based Training-to-Testing Solving for Combinatorial Optimization [83.65278205301576]
We propose to learn direct mappings from different noise levels to the optimal solution for a given instance, facilitating high-quality generation with minimal shots.
This is achieved through an optimization consistency training protocol, which minimizes the difference among samples.
Experiments on two popular tasks, the Traveling Salesman Problem (TSP) and Maximal Independent Set (MIS), demonstrate the superiority of Fast T2T regarding both solution quality and efficiency.
arXiv Detail & Related papers (2025-02-05T07:13:43Z) - Test-time Loss Landscape Adaptation for Zero-Shot Generalization in Vision-Language Models [3.1099372412393524]
This paper unveils the unnecessary nature of backpropagation in existing methods from a loss landscape perspective.
It proposes a simple yet effective framework called Test-time Loss Landscape Adaptation (TLLA)
In the prompt tuning stage, a Sharpness-Aware Prompt Tuning (SAPT) method is introduced to identify the training flat minimum.
In the test stage, a Sharpness-based Test Sample Selection (STSS) approach is utilized to ensure the alignment of flat minima.
arXiv Detail & Related papers (2025-01-31T03:10:48Z) - The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities.
TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models.
Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - Enhancing Test Time Adaptation with Few-shot Guidance [35.13317598777832]
Deep neural networks often encounter significant performance drops while facing with domain shifts between training (source) and test (target) data.
Test Time Adaptation (TTA) methods have been proposed to adapt pre-trained source model to handle out-of-distribution streaming target data.
We develop Few-Shot Test Time Adaptation (FS-TTA), a novel and practical setting that utilizes a few-shot support set on top of TTA.
arXiv Detail & Related papers (2024-09-02T15:50:48Z) - Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models [4.655740975414312]
This paper introduces Test-Time Low-rank adaptation (TTL) as an alternative to prompt tuning for zero-shot generalizations of large-scale vision-language models (VLMs)
TTL offers a test-time-efficient adaptation approach that updates the attention weights of the transformer by maximizing prediction confidence.
arXiv Detail & Related papers (2024-07-22T17:59:19Z) - RA-DIT: Retrieval-Augmented Dual Instruction Tuning [90.98423540361946]
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores.
Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance.
We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option.
arXiv Detail & Related papers (2023-10-02T17:16:26Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Improved Test-Time Adaptation for Domain Generalization [48.239665441875374]
Test-time training (TTT) adapts the learned model with test data.
This work addresses two main factors: selecting an appropriate auxiliary TTT task for updating and identifying reliable parameters to update during the test phase.
We introduce additional adaptive parameters for the trained model, and we suggest only updating the adaptive parameters during the test phase.
arXiv Detail & Related papers (2023-04-10T10:12:38Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - An Efficiency Study for SPLADE Models [5.725475501578801]
In this paper, we focus on improving the efficiency of the SPLADE model.
We propose several techniques including L1 regularization for queries, a separation of document/ encoders, a FLOPS-regularized middle-training, and the use of faster query encoders.
arXiv Detail & Related papers (2022-07-08T11:42:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.