OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
- URL: http://arxiv.org/abs/2412.19723v1
- Date: Fri, 27 Dec 2024 16:21:58 GMT
- Title: OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
- Authors: Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu,
- Abstract summary: We propose OS-Genesis, a novel data synthesis pipeline for Graphical User Interface (GUI) agents.
Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions.
A trajectory reward model is then employed to ensure the quality of the generated trajectories.
- Score: 55.390060529534644
- License:
- Abstract: Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods. Our codes, data, and checkpoints are available at \href{https://qiushisun.github.io/OS-Genesis-Home/}{OS-Genesis Homepage}.
Related papers
- SnipGen: A Mining Repository Framework for Evaluating LLMs for Code [51.07471575337676]
Language Models (LLMs) are trained on extensive datasets that include code repositories.
evaluating their effectiveness poses significant challenges due to the potential overlap between the datasets used for training and those employed for evaluation.
We introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation.
arXiv Detail & Related papers (2025-02-10T21:28:15Z) - AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials [53.376263056033046]
We propose a scalable data synthesis pipeline that generates high-quality GUI agent trajectories by leveraging web tutorials.
Our method automatically gathers tutorial-like texts from the internet, transforms them into task goals with step-by-step instructions, and employs a visual-language model agent.
A VLM-based evaluator ensures the correctness of the generated trajectories.
arXiv Detail & Related papers (2024-12-12T18:59:27Z) - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning [8.416553728391309]
High-Level Synthesis (HLS) Design Space Exploration (DSE) is a widely accepted approach for exploring optimal hardware solutions during the HLS process.
Several HLS benchmarks and datasets are available for the research community to evaluate their methodologies.
This paper proposes a novel approach, called Vaegan, that employs generative machine learning to generate synthetic data that is robust enough to support complex system-level HLS DSE experiments.
arXiv Detail & Related papers (2024-04-23T05:32:22Z) - A survey of synthetic data augmentation methods in computer vision [0.0]
This paper presents an extensive review of synthetic data augmentation techniques.
We focus on the important data generation and augmentation techniques, general scope of application and specific use-cases.
We provide a summary of common synthetic datasets for training computer vision models.
arXiv Detail & Related papers (2024-03-15T07:34:08Z) - GenQ: Quantization in Low Data Regimes with Generative Synthetic Data [28.773641633757283]
We introduce GenQ, a novel approach employing an advanced Generative AI model to generate high-resolution synthetic data.
In case of limited data availability, the actual data is used to guide the synthetic data generation process.
Through rigorous experimentation, GenQ establishes new benchmarks in data-free and data-scarce quantization.
arXiv Detail & Related papers (2023-12-07T23:31:42Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - Scalable Modular Synthetic Data Generation for Advancing Aerial Autonomy [2.9005223064604078]
We introduce a scalable Aerial Synthetic Data Augmentation (ASDA) framework tailored to aerial autonomy applications.
ASDA extends a central data collection engine with two scriptable pipelines that automatically perform scene and data augmentations.
We demonstrate the effectiveness of our method in automatically generating diverse datasets.
arXiv Detail & Related papers (2022-11-10T04:37:41Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.