Agent Alpha: Tree Search Unifying Generation, Exploration and Evaluation for Computer-Use Agents
- URL: http://arxiv.org/abs/2602.02995v1
- Date: Tue, 03 Feb 2026 02:07:12 GMT
- Title: Agent Alpha: Tree Search Unifying Generation, Exploration and Evaluation for Computer-Use Agents
- Authors: Sizhe Tang, Rongqian Chen, Tian Lan,
- Abstract summary: Agent Alpha is a unified framework that synergizes generation, exploration, and evaluation through step-level Monte Carlo Tree Search (MCTS)<n>On the OSWorld benchmark, Agent Alpha achieves a state-of-the-art success rate of $sim 77%$, significantly outperforming trajectory-level baselines under equivalent compute.
- Score: 6.357946414392768
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While scaling test-time compute through trajectory-level sampling has significantly improved Graphical User Interface (GUI) agents, the lack of regressive ability prevents the reuse of partial successes and the recovery from early missteps. In this paper, we introduce Agent Alpha, a unified framework that synergizes generation, exploration, and evaluation through step-level Monte Carlo Tree Search (MCTS). It enables active modeling or exploiting structures of the planning space. By integrating alpha-UCT guided search into the interaction loop, Agent Alpha enables deliberate planning, facilitating early pruning of suboptimal branches and efficient prefix reuse. We also employ comparison-driven evaluation to mitigate absolute scoring biases and diversity-constrained expansion to maintain a compact, informative search space. Regret bound of alpha-UCT is analyzed. On the OSWorld benchmark, Agent Alpha achieves a state-of-the-art success rate of $\sim 77\%$, significantly outperforming trajectory-level baselines under equivalent compute.
Related papers
- TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG [71.06073770344732]
Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval.<n>We present TreePS-RAG, an online, tree-based RL framework for agentic RAG that enables step-wise credit assignment while retaining outcome-only rewards.
arXiv Detail & Related papers (2026-01-11T14:07:30Z) - Alita-G: Self-Evolving Generative Agent for Agent Generation [54.49365835457433]
We present ALITA-G, a framework that transforms a general-purpose agent into a domain expert.<n>In this framework, a generalist agent executes a curated suite of target-domain tasks.<n>It attains strong gains while reducing computation costs.
arXiv Detail & Related papers (2025-10-27T17:59:14Z) - Tree Search for LLM Agent Reinforcement Learning [23.7084695563981]
Tree-based Group Relative Policy Optimization (Tree-GRPO) is a grouped agent RL method based on tree search.<n>By sharing common prefixes, the tree search sampling increases the number of rollouts achievable.<n>We demonstrate that the objective of intra-tree level group relative policy optimization is equivalent to that of step-level direct preference learning.
arXiv Detail & Related papers (2025-09-25T14:37:09Z) - Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning [32.78218766121055]
Graph retrieval-augmented generation (GraphRAG) has effectively enhanced large language models in complex reasoning.<n>We propose a vertically unified agentic paradigm, Youtu-GraphRAG, to jointly connect the entire framework as an intricate integration.
arXiv Detail & Related papers (2025-08-27T13:13:20Z) - TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling [65.46347858249295]
TreePO is a self-guided rollout algorithm that views sequence generation as a tree-structured searching process.<n>TreePO essentially reduces the per-update compute burden while preserving or enhancing exploration diversity.
arXiv Detail & Related papers (2025-08-24T16:52:37Z) - DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z) - Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [112.04307762405669]
G"odel Agent is a self-evolving framework inspired by the G"odel machine.<n>G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z) - AlphaZeroES: Direct score maximization outperforms planning loss minimization [61.17702187957206]
Planning at execution time has been shown to dramatically improve performance for agents in both single-agent and multi-agent settings.
A family of approaches to planning at execution time are AlphaZero and its variants, which use Monte Carlo Tree Search together with a neural network that guides the search by predicting state values and action probabilities.
We show that, across multiple environments, directly maximizing the episode score outperforms minimizing the planning loss.
arXiv Detail & Related papers (2024-06-12T23:00:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.