AndroidGen: Building an Android Language Agent under Data Scarcity
- URL: http://arxiv.org/abs/2504.19298v1
- Date: Sun, 27 Apr 2025 16:30:10 GMT
- Title: AndroidGen: Building an Android Language Agent under Data Scarcity
- Authors: Hanyu Lai, Junjie Gao, Xiao Liu, Yifan Xu, Shudan Zhang, Yuxiao Dong, Jie Tang,
- Abstract summary: We develop a framework called AndroidGen to enhance the capabilities of LLM-based agents under data scarcity.<n>We leverage AndroidGen to collect trajectories given human tasks and train open-source LLMs on these trajectories to develop an open-source mobile agent without manually labeled trajectories.<n>We extensively evaluate AndroidGen with AndroidWorld, AitW, and various popular applications, demonstrating its improvements and revealing potential areas for future improvement.
- Score: 32.277219971739726
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models have opened up a world of possibilities for various NLP tasks, sparking optimism for the future. Despite their potential, LLMs have yet to be widely used as agents on real mobile devices. The main challenge is the need for high-quality data sources. Time constraints and labor intensity often hinder human annotation. On the other hand, existing LLMs exhibit inadequate completion rates and need a robust data filtration strategy. Given these challenges, we develop a framework called AndroidGen to enhance the capabilities of LLM-based agents under data scarcity. In addition, we leverage AndroidGen to collect trajectories given human tasks and train open-source LLMs on these trajectories to develop an open-source mobile agent without manually labeled trajectories. We extensively evaluate AndroidGen with AndroidWorld, AitW, and various popular applications, demonstrating its improvements and revealing potential areas for future improvement. Code, model, and data are available at https://github.com/THUDM/AndroidGen.
Related papers
- LLMs in Mobile Apps: Practices, Challenges, and Opportunities [4.104646810514711]
The integration of AI techniques has become increasingly popular in software development.<n>With the rise of large language models (LLMs) and generative AI, developers now have access to a wealth of high-quality open-source models and APIs from closed-source providers.
arXiv Detail & Related papers (2025-02-21T19:53:43Z) - AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents [32.571194718225996]
We propose AndroidLab as a systematic Android agent framework.
It includes an operation environment with different modalities, action space, and a reproducible benchmark.
It supports both large language models (LLMs) and multimodal models (LMMs) in the same action space.
arXiv Detail & Related papers (2024-10-31T15:25:20Z) - Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation [10.817783356090027]
Large language models (LLMs) increasingly integrate into every aspect of our work and daily lives.
There are growing concerns about user privacy, which push the trend toward local deployment of these models.
As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices.
arXiv Detail & Related papers (2024-10-04T17:14:59Z) - MEGen: Generative Backdoor in Large Language Models via Model Editing [56.46183024683885]
Large language models (LLMs) have demonstrated remarkable capabilities.
Their powerful generative abilities enable flexible responses based on various queries or instructions.
This paper proposes an editing-based generative backdoor, named MEGen, aiming to create a customized backdoor for NLP tasks with the least side effects.
arXiv Detail & Related papers (2024-08-20T10:44:29Z) - LLMs Meet Multimodal Generation and Editing: A Survey [89.76691959033323]
This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio.
We summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods.
We dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction.
arXiv Detail & Related papers (2024-05-29T17:59:20Z) - Large Language Models (LLMs) Assisted Wireless Network Deployment in Urban Settings [0.21847754147782888]
Large Language Models (LLMs) have revolutionized language understanding and human-like text generation.
This paper explores new techniques to harness the power of LLMs for 6G (6th Generation) wireless communication technologies.
We introduce a novel Reinforcement Learning (RL) based framework that leverages LLMs for network deployment in wireless communications.
arXiv Detail & Related papers (2024-05-22T05:19:51Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [52.98743860365194]
We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN)
At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself.
This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents.
arXiv Detail & Related papers (2024-01-02T18:53:13Z) - SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant.
It generates high-quality instruction-based data for the domain of software engineering.
It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z) - Ghost in the Minecraft: Generally Capable Agents for Open-World
Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory.
We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute.
The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z) - AndroidEnv: A Reinforcement Learning Platform for Android [41.572096255032946]
AndroidEnv is an open-source platform for Reinforcement Learning (RL) research built on top of the Android ecosystem.
It allows RL agents to interact with a wide variety of apps and services commonly used by humans through a universal touchscreen interface.
Since agents train on a realistic simulation of an Android device, they have the potential to be deployed on real devices.
arXiv Detail & Related papers (2021-05-27T15:20:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.