Related papers: OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

URL: http://arxiv.org/abs/2412.19723v1
Date: Fri, 27 Dec 2024 16:21:58 GMT
Title: OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Authors: Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu,
Abstract summary: We propose OS-Genesis, a novel data synthesis pipeline for Graphical User Interface (GUI) agents.<n>Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions.<n>A trajectory reward model is then employed to ensure the quality of the generated trajectories.
Score: 55.390060529534644
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods. Our codes, data, and checkpoints are available at \href{https://qiushisun.github.io/OS-Genesis-Home/}{OS-Genesis Homepage}.

Related papers

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents [20.85611634311147]
We introduce WebFactory, a novel, fully automated closed-loop reinforcement learning pipeline for GUI agents.<n>Our agent demonstrates exceptional data efficiency and generalization.<n>This work presents a scalable and cost-effective paradigm for transforming passive internet knowledge into active, grounded intelligence.
arXiv Detail & Related papers (2026-03-05T10:51:34Z)
ANCHOR: Branch-Point Data Generation for GUI Agents [52.22377425487]
End-to-end GUI agents for real desktop environments require large amounts of high-quality interaction data.<n>We present a trajectory expansion framework Anchor that bootstraps scalable desktop supervision from a small set of verified seed demonstrations.<n>Experiments on standard desktop benchmarks, OSWorld and WindowsAgentArena, show that models fine-tuned on our expanded corpus achieve consistent improvements.
arXiv Detail & Related papers (2026-02-06T19:55:26Z)
UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation [70.2215233759276]
UtilGen is a novel utility-centric data augmentation framework for computer vision tasks.<n>We show that UtilGen consistently achieves superior datasets, with an average accuracy improvement of 3.87% over previous SOTA.<n>Further analysis of data influence and distribution reveals that UtilGen produces more impactful and task-relevant synthetic data.
arXiv Detail & Related papers (2025-10-28T10:17:11Z)
FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation [60.28409233931666]
We introduce FieldGen, a field-guided data generation framework that enables scalable, diverse, and high-quality real-world data collection.<n>Experiments demonstrate that policies trained with FieldGen achieve higher success rates and improved stability compared to teleoperation-based baselines.
arXiv Detail & Related papers (2025-10-23T17:47:12Z)
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents [37.871793585090586]
We introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents.<n> verification of trajectory outcome is challenging and high-quality training data are not scalable.<n>We show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks.
arXiv Detail & Related papers (2025-05-27T17:58:06Z)
STEVE: A Step Verification Pipeline for Computer-use Agent Training [84.24814828303163]
STEVE is a step verification pipeline for computer-use agent training. GPT-4o is used to verify the correctness of each step in the trajectories based on the screens before and after the action execution. Our agent outperforms supervised finetuning by leveraging both positive and negative actions within a trajectory.
arXiv Detail & Related papers (2025-03-16T14:53:43Z)
Large Language Models as Realistic Microservice Trace Generators [54.85489678342595]
Workload traces are essential to understand complex computer systems' behavior and manage processing and memory resources. This paper proposes a first-of-a-kind approach that relies on training a large language model to generate synthetic workload traces. Our model adapts to downstream trace-related tasks, such as predicting key trace features and infilling missing data.
arXiv Detail & Related papers (2024-12-16T12:48:04Z)
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
We propose a scalable data synthesis pipeline that generates high-quality GUI agent trajectories by leveraging web tutorials.<n>Our method automatically gathers tutorial-like texts from the internet, transforms them into task goals with step-by-step instructions, and employs a visual-language model agent.<n>A VLM-based evaluator ensures the correctness of the generated trajectories.
arXiv Detail & Related papers (2024-12-12T18:59:27Z)
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models. BVS supports a large number of adjustable parameters at the scene level. We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z)
Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning [8.416553728391309]
High-Level Synthesis (HLS) Design Space Exploration (DSE) is a widely accepted approach for exploring optimal hardware solutions during the HLS process. Several HLS benchmarks and datasets are available for the research community to evaluate their methodologies. This paper proposes a novel approach, called Vaegan, that employs generative machine learning to generate synthetic data that is robust enough to support complex system-level HLS DSE experiments.
arXiv Detail & Related papers (2024-04-23T05:32:22Z)
A survey of synthetic data augmentation methods in computer vision [0.0]
This paper presents an extensive review of synthetic data augmentation techniques. We focus on the important data generation and augmentation techniques, general scope of application and specific use-cases. We provide a summary of common synthetic datasets for training computer vision models.
arXiv Detail & Related papers (2024-03-15T07:34:08Z)
GenQ: Quantization in Low Data Regimes with Generative Synthetic Data [28.773641633757283]
We introduce GenQ, a novel approach employing an advanced Generative AI model to generate high-resolution synthetic data. In case of limited data availability, the actual data is used to guide the synthetic data generation process. Through rigorous experimentation, GenQ establishes new benchmarks in data-free and data-scarce quantization.
arXiv Detail & Related papers (2023-12-07T23:31:42Z)
STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances. We design fine-grained step-by-step instructions to obtain the initial data instances. Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z)
Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy [2.9005223064604078]
We introduce a scalable Aerial Synthetic Data Augmentation (ASDA) framework tailored to aerial autonomy applications. ASDA extends a central data collection engine with two scriptable pipelines that automatically perform scene and data augmentations. We demonstrate the effectiveness of our method in automatically generating diverse datasets.
arXiv Detail & Related papers (2022-11-10T04:37:41Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.