Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation
- URL: http://arxiv.org/abs/2311.13209v4
- Date: Sun, 19 May 2024 06:20:24 GMT
- Title: Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation
- Authors: Junyu Gao, Xuan Yao, Changsheng Xu,
- Abstract summary: We propose a Fast-Slow Test-Time Adaptation (FSTTA) approach for online Vision-and-Language Navigation (VLN)
Our method obtains impressive performance gains on four popular benchmarks.
- Score: 67.18144414660681
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to accurately comprehend natural language instructions and navigate to the target location is essential for an embodied agent. Such agents are typically required to execute user instructions in an online manner, leading us to explore the use of unlabeled test samples for effective online model adaptation. However, for online Vision-and-Language Navigation (VLN), due to the intrinsic nature of inter-sample online instruction execution and intra-sample multi-step action decision, frequent updates can result in drastic changes in model parameters, while occasional updates can make the model ill-equipped to handle dynamically changing environments. Therefore, we propose a Fast-Slow Test-Time Adaptation (FSTTA) approach for online VLN by performing joint decomposition-accumulation analysis for both gradients and parameters in a unified framework. Extensive experiments show that our method obtains impressive performance gains on four popular benchmarks. Code is available at https://github.com/Feliciaxyao/ICML2024-FSTTA.
Related papers
- Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Vision-Language Models [4.655740975414312]
This paper introduces Test-Time Low-rank adaptation (TTL) as an alternative to prompt tuning for zero-shot generalizations of large-scale vision-language models (VLMs)
TTL offers a test-time-efficient adaptation approach that updates the attention weights of the transformer by maximizing prediction confidence.
arXiv Detail & Related papers (2024-07-22T17:59:19Z) - Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - Revisiting Dynamic Evaluation: Online Adaptation for Large Language
Models [88.47454470043552]
We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation.
Online adaptation turns parameters into temporally changing states and provides a form of context-length extension with memory in weights.
arXiv Detail & Related papers (2024-03-03T14:03:48Z) - AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation [16.85284386728494]
We propose to validate test-time adaptation methods using datasets for autonomous driving, namely CLAD-C and SHIFT.
We observe that current test-time adaptation methods struggle to effectively handle varying degrees of domain shift.
The proposed method, named AR-TTA, outperforms existing approaches on both synthetic and more real-world benchmarks.
arXiv Detail & Related papers (2023-09-18T19:34:23Z) - Prompting Decision Transformer for Few-Shot Policy Generalization [98.0914217850999]
We propose a Prompt-based Decision Transformer (Prompt-DT) to achieve few-shot adaptation in offline RL.
Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks.
arXiv Detail & Related papers (2022-06-27T17:59:17Z) - Visual-Language Navigation Pretraining via Prompt-based Environmental
Self-exploration [83.96729205383501]
We introduce prompt-based learning to achieve fast adaptation for language embeddings.
Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
arXiv Detail & Related papers (2022-03-08T11:01:24Z) - Dynamic Regret Analysis for Online Meta-Learning [0.0]
The online meta-learning framework has arisen as a powerful tool for the continual lifelong learning setting.
This formulation involves two levels: outer level which learns meta-learners and inner level which learns task-specific models.
We establish performance in terms of dynamic regret which handles changing environments from a global prospective.
We carry out our analyses in a setting, and in expectation prove a logarithmic local dynamic regret which explicitly depends on the total number of iterations.
arXiv Detail & Related papers (2021-09-29T12:12:59Z) - Towards Self-Adaptive Metric Learning On the Fly [16.61982837441342]
We aim to address the open challenge of "Online Adaptive Metric Learning" (OAML) for learning adaptive metric functions on the fly.
Unlike traditional online metric learning methods, OAML is significantly more challenging since the learned metric could be non-linear and the model has to be self-adaptive.
We present a new online metric learning framework that attempts to tackle the challenge by learning an ANN-based metric with adaptive model complexity from a stream of constraints.
arXiv Detail & Related papers (2021-04-03T23:11:52Z) - Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches.
When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy.
We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z) - AutoTrack: Towards High-Performance Visual Tracking for UAV with
Automatic Spatio-Temporal Regularization [19.379240684856423]
Most existing trackers based on discriminative correlation filters (DCF) try to introduce predefined regularization term to improve the learning of target objects.
In this work, a novel approach is proposed to online and automatically adaptively learn-temporal regularization term.
Experiments on four UAV benchmarks have proven the superiority of our method compared to the state-of-the-art CPU- and GPU-based trackers.
arXiv Detail & Related papers (2020-03-29T05:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.