Related papers: Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents

Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents

URL: http://arxiv.org/abs/2410.22552v1
Date: Tue, 29 Oct 2024 21:37:04 GMT
Title: Auto-Intent: Automated Intent Discovery and Self-Exploration for Large Language Model Web Agents
Authors: Jaekyeom Kim, Dong-Ki Kim, Lajanugen Logeswaran, Sungryull Sohn, Honglak Lee,
Abstract summary: We introduce Auto-Intent, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning. Our approach first discovers the underlying intents from target domain demonstrations unsupervisedly. We train our intent predictor to predict the next intent given the agent's past observations and actions.
Score: 68.22496852535937
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we introduce Auto-Intent, a method to adapt a pre-trained large language model (LLM) as an agent for a target domain without direct fine-tuning, where we empirically focus on web navigation tasks. Our approach first discovers the underlying intents from target domain demonstrations unsupervisedly, in a highly compact form (up to three words). With the extracted intents, we train our intent predictor to predict the next intent given the agent's past observations and actions. In particular, we propose a self-exploration approach where top-k probable intent predictions are provided as a hint to the pre-trained LLM agent, which leads to enhanced decision-making capabilities. Auto-Intent substantially improves the performance of GPT-{3.5, 4} and Llama-3.1-{70B, 405B} agents on the large-scale real-website navigation benchmarks from Mind2Web and online navigation tasks from WebArena with its cross-benchmark generalization from Mind2Web.

Related papers

Enhancing Web Agents with Explicit Rollback Mechanisms [55.276852838877346]
We enhance web agents with an explicit rollback mechanism, enabling the agent to revert back to a previous state in its navigation trajectory. This mechanism gives the model the flexibility to directly control the search process, leading to an effective and efficient web navigation method.
arXiv Detail & Related papers (2025-04-16T05:41:20Z)
WEPO: Web Element Preference Optimization for LLM-based Web Navigation [3.9400326648635566]
This paper introduces a novel approach to web navigation tasks, called Web Element Preference Optimization (WEPO) We utilize unsupervised preference learning by sampling distance-based non-salient web elements as negative samples, optimizing maximum likelihood objective within Direct Preference Optimization (DPO) The results show that our method achieved the state-of-the-art, with an improvement of 13.8% over WebAgent and 5.3% over the visual language model CogAgent baseline.
arXiv Detail & Related papers (2024-12-14T08:25:28Z)
SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World [50.937342998351426]
Chain-of-User-Thought (COUT) is a novel embodied reasoning paradigm. We introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements. Our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning.
arXiv Detail & Related papers (2024-12-10T12:40:35Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [117.94654815220404]
G"odel Agent is a self-evolving framework inspired by the G"odel machine. G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z)
Large Language Models Can Self-Improve At Web Agent Tasks [37.17001438055515]
Large language models (LLMs) have recently demonstrated some capability to navigate novel environments as agents in a zero-shot or few-shot fashion. We explore the extent to which LLMs can self-improve their performance as agents in long-horizon tasks in a complex environment using the WebArena benchmark. We achieve a 31% improvement in task completion rate over the base model on the WebArena benchmark through a self-improvement procedure.
arXiv Detail & Related papers (2024-05-30T17:52:36Z)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z)
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z)
Predicting Dense and Context-aware Cost Maps for Semantic Robot Navigation [35.45993685414002]
We investigate the task of object goal navigation in unknown environments where the target is specified by a semantic label. We propose a deep neural network architecture and loss function to predict dense cost maps that implicitly contain semantic context. We also present a novel way of fusing mid-level visual representations in our architecture to provide additional semantic cues for cost map prediction.
arXiv Detail & Related papers (2022-10-17T11:43:19Z)
Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships [52.72020203771489]
We investigate target-driven visual navigation using deep reinforcement learning (DRL) in 3D indoor scenes. Our proposed method combines visual features and 3D spatial representations to learn navigation policy. Our experiments, performed in the AI2-THOR, show that our model outperforms the baselines in both SR and SPL metrics.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.