Related papers: Active Test-time Vision-Language Navigation

Active Test-time Vision-Language Navigation

URL: http://arxiv.org/abs/2506.06630v1
Date: Sat, 07 Jun 2025 02:24:44 GMT
Title: Active Test-time Vision-Language Navigation
Authors: Heeju Ko, Sungjune Kim, Gyeongrok Oh, Jeongyoon Yoon, Honglak Lee, Sujin Jang, Seungryong Kim, Sangpil Kim,
Abstract summary: ATENA is a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes.<n>In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration.<n>In addition, we propose a self-active learning strategy that enables an agent to evaluate its navigation outcomes based on confident predictions.
Score: 60.69722522420299
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision-Language Navigation (VLN) policies trained on offline datasets often exhibit degraded task performance when deployed in unfamiliar navigation environments at test time, where agents are typically evaluated without access to external interaction or feedback. Entropy minimization has emerged as a practical solution for reducing prediction uncertainty at test time; however, it can suffer from accumulated errors, as agents may become overconfident in incorrect actions without sufficient contextual grounding. To tackle these challenges, we introduce ATENA (Active TEst-time Navigation Agent), a test-time active learning framework that enables a practical human-robot interaction via episodic feedback on uncertain navigation outcomes. In particular, ATENA learns to increase certainty in successful episodes and decrease it in failed ones, improving uncertainty calibration. Here, we propose mixture entropy optimization, where entropy is obtained from a combination of the action and pseudo-expert distributions-a hypothetical action distribution assuming the agent's selected action to be optimal-controlling both prediction confidence and action preference. In addition, we propose a self-active learning strategy that enables an agent to evaluate its navigation outcomes based on confident predictions. As a result, the agent stays actively engaged throughout all iterations, leading to well-grounded and adaptive decision-making. Extensive evaluations on challenging VLN benchmarks-REVERIE, R2R, and R2R-CE-demonstrate that ATENA successfully overcomes distributional shifts at test time, outperforming the compared baseline methods across various settings.

Related papers

When to Act, Ask, or Learn: Uncertainty-Aware Policy Steering [10.01278648231868]
Policy steering is an emerging way to adapt robot behaviors at deployment-time.<n> Vision-Language Models (VLMs) are promising general-purpose verifiers due to their reasoning capabilities.<n>We propose uncertainty-aware policy steering (UPS), a framework that jointly reasons about semantic task uncertainty and low-level action feasibility.
arXiv Detail & Related papers (2026-02-25T23:23:22Z)
ARTIS: Agentic Risk-Aware Test-Time Scaling via Iterative Simulation [72.78362530982109]
ARTIS, Agentic Risk-Aware Test-Time Scaling via Iterative Simulation, is a framework that decouples exploration from commitment.<n>We show that naive LLM-based simulators struggle to capture rare but high-impact failure modes.<n>We introduce a risk-aware tool simulator that emphasizes fidelity on failure-inducing actions.
arXiv Detail & Related papers (2026-02-02T06:33:22Z)
Actively Obtaining Environmental Feedback for Autonomous Action Evaluation Without Predefined Measurements [0.11844977816228043]
An AI agent proactively interacts with the environment to discover, screen, and verify feedback without relying on predefined measurements.<n>A self-triggering mechanism is introduced to autonomously plan and adjust actions.<n> Experimental results demonstrate that the proposed active approach significantly improves the efficiency and robustness of factor identification.
arXiv Detail & Related papers (2026-01-04T09:52:56Z)
Co-Evolving Agents: Learning from Failures as Hard Negatives [38.61683607205988]
Recent work has explored self-improving agents that autonomously generate, refine, and re-train on their own trajectories.<n>We propose a co-evolving agents framework in which a target agent improves jointly with an auxiliary failure agent.
arXiv Detail & Related papers (2025-11-27T09:30:33Z)
Preliminary Investigation into Uncertainty-Aware Attack Stage Classification [81.28215542218724]
This work addresses the problem of attack stage inference under uncertainty.<n>We propose a classification approach based on Evidential Deep Learning (EDL), which models predictive uncertainty by outputting parameters of a Dirichlet distribution over possible stages.<n>Preliminary experiments in a simulated environment demonstrate that the proposed model can accurately infer the stage of an attack with confidence.
arXiv Detail & Related papers (2025-08-01T06:58:00Z)
Aurora: Are Android Malware Classifiers Reliable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is further complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility we observe in state-of-the-art frameworks suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z)
MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning [33.009759731505746]
Complex tasks involving tool integration pose significant challenges for Large Language Models.<n> Reflection has emerged as an effective strategy for correcting erroneous trajectories in agentic benchmarks.<n>We propose MIRROR, a framework that consists of both intra-reflection, which critically assesses intended actions before execution, and inter-reflection, which further adjusts the trajectory.
arXiv Detail & Related papers (2025-05-27T03:37:33Z)
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection [71.92083784393418]
Inference-time methods such as Best-of-N (BON) sampling offer a simple yet effective alternative to improve performance.<n>We propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
arXiv Detail & Related papers (2025-04-02T17:40:47Z)
PredictaBoard: Benchmarking LLM Score Predictability [50.47497036981544]
Large Language Models (LLMs) often fail unpredictably.<n>This poses a significant challenge to ensuring their safe deployment.<n>We present PredictaBoard, a novel collaborative benchmarking framework.
arXiv Detail & Related papers (2025-02-20T10:52:38Z)
Uncertainty-Penalized Direct Preference Optimization [52.387088396044206]
We develop a pessimistic framework for DPO by introducing preference uncertainty penalization schemes. The penalization serves as a correction to the loss which attenuates the loss gradient for uncertain samples. We show improved overall performance compared to vanilla DPO, as well as better completions on prompts from high-uncertainty chosen/rejected responses.
arXiv Detail & Related papers (2024-10-26T14:24:37Z)
UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection [18.25576487115016]
This paper focuses on Human-Object Interaction (HOI) detection. It addresses the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. We propose a novel approach textscUAHOI, Uncertainty-aware Robust Human-Object Interaction Learning.
arXiv Detail & Related papers (2024-08-14T10:06:39Z)
Singular Value Penalization and Semantic Data Augmentation for Fully Test-Time Adaptation [5.891527229524256]
Test-time adaptation (FTTA) adapts a model that is trained on a source domain to a target domain during the testing phase. We propose maximizing the sum of singular values while minimizing their variance. This enables the model's focus toward the smaller singular values, enhancing discriminability between more challenging classes and effectively increasing the diversity of prediction results.
arXiv Detail & Related papers (2023-12-10T01:08:56Z)
An active learning method for solving competitive multi-agent decision-making and control problems [1.2430809884830318]
We introduce a novel active-learning scheme to identify a stationary action profile for a population of competitive agents. We show that the proposed learning-based approach can be applied to typical multi-agent control and decision-making problems.
arXiv Detail & Related papers (2022-12-23T19:37:39Z)
Domain Adaptation with Adversarial Training on Penultimate Activations [82.9977759320565]
Enhancing model prediction confidence on unlabeled target data is an important objective in Unsupervised Domain Adaptation (UDA) We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features.
arXiv Detail & Related papers (2022-08-26T19:50:46Z)
Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning. We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z)
Can Autonomous Vehicles Identify, Recover From, and Adapt to Distribution Shifts? [104.04999499189402]
Out-of-training-distribution (OOD) scenarios are a common challenge of learning agents at deployment. We propose an uncertainty-aware planning method, called emphrobust imitative planning (RIP) Our method can detect and recover from some distribution shifts, reducing the overconfident and catastrophic extrapolations in OOD scenes. We introduce an autonomous car novel-scene benchmark, textttCARNOVEL, to evaluate the robustness of driving agents to a suite of tasks with distribution shifts.
arXiv Detail & Related papers (2020-06-26T11:07:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.