Related papers: Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

URL: http://arxiv.org/abs/2506.07976v2
Date: Tue, 10 Jun 2025 12:50:18 GMT
Title: Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
Authors: Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar,
Abstract summary: We propose to scale test-time interaction, an untapped dimension of test-time scaling.<n>We first show that even prompting-based interaction scaling can improve task success on web benchmarks non-trivially.<n>We introduce TTI (Test-Time Interaction), a curriculum-based online reinforcement learning approach that trains agents by adaptively adjusting their rollout lengths.
Score: 46.286440953594266
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The current paradigm of test-time scaling relies on generating long reasoning traces ("thinking" more) before producing a response. In agent problems that require interaction, this can be done by generating thinking traces before acting in the world. However, this process does not allow agents to acquire new information from the environment or adapt their behavior over time. In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent's interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we study the domain of web agents. We first show that even prompting-based interaction scaling without any training can improve task success on web benchmarks non-trivially. Building on this, we introduce TTI (Test-Time Interaction), a curriculum-based online reinforcement learning (RL) approach that trains agents by adaptively adjusting their rollout lengths. Using a Gemma 3 12B model, TTI produces state-of-the-art open-source, open-data web agents on WebVoyager and WebArena benchmarks. We further show that TTI enables agents to balance exploration and exploitation adaptively. Our results establish interaction scaling as a powerful, complementary axis to scaling per-step compute, offering new avenues for training adaptive agents.

Related papers

Multi-dimensional Autoscaling of Processing Services: A Comparison of Agent-based Methods [5.201504495733271]
This work introduces an agent-based autoscaling framework to maximize requirements fulfillment in constrained environments.<n>We compare four types of scaling agents: Active Inference, Deep Q Network, Analysis of Structural Knowledge, and Deep Active Inference.
arXiv Detail & Related papers (2025-06-12T07:20:26Z)
AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents [28.20409050985182]
A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants.<n>We present AgentA/B, a novel system that automatically simulate user interaction behaviors with real webpages.<n>Our findings suggest AgentA/B can emulate human-like behavior patterns.
arXiv Detail & Related papers (2025-04-13T21:10:56Z)
Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark [72.46357004059661]
We propose Similar, a step-wise Multi-dimensional Generalist Reward Model.<n>It offers fine-grained signals for agent training and can choose better action for inference-time scaling.<n>We introduce the first benchmark in the virtual agent domain for step-wise, multi-dimensional reward model training and evaluation.
arXiv Detail & Related papers (2025-03-24T13:30:47Z)
Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions [25.464036307823974]
SocialEgoNet is a graph-based framework that exploits task dependencies through a hierarchical learning approach.<n>SocialEgoNet uses body skeletons (keypoints from face, hands and body) extracted from only 1 second of video input for high inference speed.<n>For evaluation, we augment an existing egocentric human-agent interaction with new class labels and bounding box annotations.
arXiv Detail & Related papers (2024-12-21T16:54:28Z)
Interactive Autonomous Navigation with Internal State Inference and Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework. These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents. Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z)
Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation [67.18144414660681]
We propose a Fast-Slow Test-Time Adaptation (FSTTA) approach for online Vision-and-Language Navigation (VLN) Our method obtains impressive performance gains on four popular benchmarks.
arXiv Detail & Related papers (2023-11-22T07:47:39Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context. We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.