Related papers: ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation

ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation

URL: http://arxiv.org/abs/2510.22732v1
Date: Sun, 26 Oct 2025 16:03:39 GMT
Title: ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation
Authors: Jiali Cheng, Anjishnu Kumar, Roshan Lal, Rishi Rajasekaran, Hani Ramezani, Omar Zia Khan, Oleg Rokhlenko, Sunny Chiu-Webster, Gang Hua, Hadi Amiri,
Abstract summary: ATLAS is a memory-augmented agent that makes plans grounded in a model of the environment by simulating the consequences of those actions in cognitive space.<n>On the WebArena-Lite Benchmark, we achieve a 63% success rate compared to 53.9% success rate for the previously published state-of-the-art.
Score: 28.54052846801967
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We observe that current state-of-the-art web-agents are unable to effectively adapt to new environments without neural network fine-tuning, without which they produce inefficient execution plans due to a lack of awareness of the structure and dynamics of the new environment. To address this limitation, we introduce ATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented agent that is able to make plans grounded in a model of the environment by simulating the consequences of those actions in cognitive space. Our agent starts by building a "cognitive map" by performing a lightweight curiosity driven exploration of the environment. The planner proposes candidate actions; the simulator predicts their consequences in cognitive space; a critic analyzes the options to select the best roll-out and update the original plan; and a browser executor performs the chosen action. On the WebArena-Lite Benchmark, we achieve a 63% success rate compared to 53.9% success rate for the previously published state-of-the-art. Unlike previous systems, our modular architecture requires no website-specific LLM fine-tuning. Ablations show sizable drops without the world-model, hierarchical planner, and look-ahead-based replanner confirming their complementary roles within the design of our system

Related papers

Self-Correcting VLA: Online Action Refinement via Sparse World Imagination [55.982504915794514]
We propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination.<n>SC-VLA achieve state-of-the-art performance, yielding the highest task throughput with 16% fewer steps and a 9% higher success rate than the best-performing baselines.
arXiv Detail & Related papers (2026-02-25T06:58:06Z)
OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval [33.823055061609125]
We propose OSCAR, an optimization-steered agentic planning framework for composed image retrieval.<n>We are the first to reformulate agentic CIR from a search process into a principled trajectory optimization problem.<n>In the offline phase, we model CIR via atomic retrieval selection and composition as a two-stage mixed-integer programming problem.<n>These trajectories are then stored in a golden library to serve as in-context demonstrations for online steering of VLM planner.
arXiv Detail & Related papers (2026-02-09T12:44:56Z)
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z)
nuPlan-R: A Closed-Loop Planning Benchmark for Autonomous Driving via Reactive Multi-Agent Simulation [2.585002881750625]
We present nuPlan-R, a new reactive closed-loop planning benchmark.<n>Our benchmark replaces the rule-based IDM agents with noise-decoupled diffusion-based reactive agents.<n>We extend the benchmark with two additional metrics to enable a more comprehensive assessment of planning performance.
arXiv Detail & Related papers (2025-11-13T15:23:30Z)
EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence [17.644658293987955]
Embodied AI agents are capable of robust spatial perception, effective task planning, and adaptive execution in physical environments.<n>Current large language models (LLMs) and multimodal LLMs (MLLMs) for embodied tasks suffer from key limitations.<n>We propose EmbodiedBrain, a novel vision-language foundation model available in both 7B and 32B parameter sizes.
arXiv Detail & Related papers (2025-10-23T14:05:55Z)
Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting [92.57796055887995]
We introduce ECHO, a prompting framework that adapts hindsight experience replay from reinforcement learning for language model agents.<n> ECHO generates optimized trajectories for alternative goals that could have been achieved during failed attempts.<n>We evaluate ECHO on stateful versions of XMiniGrid, a text-based navigation and planning benchmark, and PeopleJoinQA, a collaborative information-gathering enterprise simulation.
arXiv Detail & Related papers (2025-10-11T18:11:09Z)
ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning [53.065247112514534]
ATLAS is a general multi-agent framework designed to handle complex nature of constraints awareness in real-world travel planning tasks.<n>We demonstrate state-of-the-art performance on the TravelPlanner benchmark, improving the final pass rate from 23.3% to 44.4% over its best alternative.
arXiv Detail & Related papers (2025-09-29T23:23:52Z)
LASER: LLM Agent with State-Space Exploration for Web Navigation [57.802977310392755]
Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation. Previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples. We propose to model the interactive task as state space exploration, where the LLM agent transitions among a pre-defined set of states by performing actions to complete the task.
arXiv Detail & Related papers (2023-09-15T05:44:08Z)
Active Sensing with Predictive Coding and Uncertainty Minimization [0.0]
We present an end-to-end procedure for embodied exploration inspired by two biological computations. We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment. We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes.
arXiv Detail & Related papers (2023-07-02T21:14:49Z)
Novelty Accommodating Multi-Agent Planning in High Fidelity Simulated Open World [7.821603097781892]
We address the challenge that arises when unexpected phenomena, termed textitnovelties, emerge within the environment.<n>The introduction of novelties into the environment can lead to inaccuracies within the planner's internal model, rendering previously generated plans obsolete.<n>We propose a general purpose AI agent framework designed to detect, characterize, and adapt to support concurrent actions and external scheduling.
arXiv Detail & Related papers (2023-06-22T03:44:04Z)
AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks. We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback. To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z)
Generating Executable Action Plans with Environmentally-Aware Language Models [4.162663632560141]
Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents. We propose an approach to generate environmentally-aware action plans that agents are better able to execute.
arXiv Detail & Related papers (2022-10-10T18:56:57Z)
STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.