Inducing Programmatic Skills for Agentic Tasks
- URL: http://arxiv.org/abs/2504.06821v1
- Date: Wed, 09 Apr 2025 12:25:37 GMT
- Title: Inducing Programmatic Skills for Agentic Tasks
- Authors: Zora Zhiruo Wang, Apurva Gandhi, Graham Neubig, Daniel Fried,
- Abstract summary: We propose agent skill induction (ASI) to allow agents to adapt themselves by inducing, verifying, and utilizing program-based skills on the fly.<n>We show that ASI outperforms the static baseline agent and its text-skill counterpart by 23.5% and 11.3% in success rate.
- Score: 53.964176411616
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: To succeed in common digital tasks such as web navigation, agents must carry out a variety of specialized tasks such as searching for products or planning a travel route. To tackle these tasks, agents can bootstrap themselves by learning task-specific skills online through interaction with the web environment. In this work, we demonstrate that programs are an effective representation for skills. We propose agent skill induction (ASI), which allows agents to adapt themselves by inducing, verifying, and utilizing program-based skills on the fly. We start with an evaluation on the WebArena agent benchmark and show that ASI outperforms the static baseline agent and its text-skill counterpart by 23.5% and 11.3% in success rate, mainly thanks to the programmatic verification guarantee during the induction phase. ASI also improves efficiency by reducing 10.7-15.3% of the steps over baselines, by composing primitive actions (e.g., click) into higher-level skills (e.g., search product). We then highlight the efficacy of ASI in remaining efficient and accurate under scaled-up web activities. Finally, we examine the generalizability of induced skills when transferring between websites, and find that ASI can effectively reuse common skills, while also updating incompatible skills to versatile website changes.
Related papers
- Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning [25.18006424626525]
GO-Skill is a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance.<n>Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library.<n>We integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks.
arXiv Detail & Related papers (2025-07-09T07:54:49Z) - UI-Evol: Automatic Knowledge Evolving for Computer Use Agents [19.978272700123004]
We propose UI-Evol, a plug-and-play module for autonomous GUI knowledge evolution.<n> UI-Evol consists of two stages: a Retrace Stage that extracts faithful objective action sequences from actual agent-environment interactions, and a Critique Stage that refines existing knowledge by comparing these sequences against external references.<n>Our results demonstrate that UI-Evol not only significantly boosts task performance but also addresses a previously overlooked issue of high behavioral standard deviation in computer use agents.
arXiv Detail & Related papers (2025-05-28T04:32:05Z) - Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition [47.068088124436535]
Generative skill acquisition enables embodied agents to actively learn a scalable and evolving repertoire of control skills.<n>We propose VERGSA, a framework that systematically integrates real-time verification principles into embodied skill learning.<n>To the best of our knowledge, this approach constitutes the first comprehensive training dataset for verification-driven generative skill acquisition.
arXiv Detail & Related papers (2025-05-16T12:19:13Z) - SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills [48.05057798832005]
We introduce SkillWeaver, a skill-centric framework enabling web agents to self-improve by autonomously synthesizing reusable skills as APIs.<n>Given a new website, the agent autonomously discovers skills, executes them for practice, and distills practice experiences into robust APIs.<n>Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, achieving relative success rate improvements of 31.8% and 39.8%, respectively.
arXiv Detail & Related papers (2025-04-09T17:51:50Z) - AppAgentX: Evolving GUI Agents as Proficient Smartphone Users [34.70342284525283]
We propose a novel evolutionary framework for GUI agents that enhances operational efficiency while retaining intelligence and flexibility.
Our approach incorporates a memory mechanism that records the agent's task execution history.
Experimental results on multiple benchmark tasks demonstrate that our approach significantly outperforms existing methods in both efficiency and accuracy.
arXiv Detail & Related papers (2025-03-04T04:34:09Z) - Agent Workflow Memory [71.81385627556398]
We introduce Agent Memory, a method for inducing commonly reused routines.
AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate.
Online AWM robustly generalizes in cross-task, website, and domain evaluations.
arXiv Detail & Related papers (2024-09-11T17:21:00Z) - KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance [51.09834120088799]
We introduce the hybrid Key-state guided Online Imitation (KOI) learning method.
We use visual-language models to extract semantic key states from expert trajectory, indicating the objectives of "what to do"
Within the intervals between semantic key states, optical flow is employed to capture motion key states to understand the mechanisms of "how to do"
arXiv Detail & Related papers (2024-08-06T02:53:55Z) - Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.<n>On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 30K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - Variational Curriculum Reinforcement Learning for Unsupervised Discovery
of Skills [25.326624139426514]
We propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Vari Curriculum Curriculum (VUVC)
We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum.
We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup.
arXiv Detail & Related papers (2023-10-30T10:34:25Z) - Human-Timescale Adaptation in an Open-Ended Task Space [56.55530165036327]
We show that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans.
Our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
arXiv Detail & Related papers (2023-01-18T15:39:21Z) - Design of Negative Sampling Strategies for Distantly Supervised Skill
Extraction [19.43668931500507]
We propose an end-to-end system for skill extraction, based on distant supervision through literal matching.
We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements.
We release the benchmark dataset for research purposes to stimulate further research on the task.
arXiv Detail & Related papers (2022-09-13T13:37:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.