DroidCall: A Dataset for LLM-powered Android Intent Invocation
- URL: http://arxiv.org/abs/2412.00402v1
- Date: Sat, 30 Nov 2024 08:55:39 GMT
- Title: DroidCall: A Dataset for LLM-powered Android Intent Invocation
- Authors: Weikai Xie, Li Zhang, Shihe Wang, Rongjie Yi, Mengwei Xu,
- Abstract summary: We introduce DroidCall, the first training and testing dataset for accurate Android intent invocation.
With a highly flexible and reusable data generation pipeline, we constructed 10k samples in DroidCall.
We also provide an end-to-end Android app equipped with these fine-tuned models to demonstrate the Android intent invocation process.
- Score: 5.147660365233947
- License:
- Abstract: The growing capabilities of large language models in natural language understanding significantly strengthen existing agentic systems. To power performant on-device mobile agents for better data privacy, we introduce DroidCall, the first training and testing dataset for accurate Android intent invocation. With a highly flexible and reusable data generation pipeline, we constructed 10k samples in DroidCall. Given a task instruction in natural language, small language models such as Qwen2.5-3B and Gemma2-2B fine-tuned with DroidCall can approach or even surpass the capabilities of GPT-4o for accurate Android intent invocation. We also provide an end-to-end Android app equipped with these fine-tuned models to demonstrate the Android intent invocation process. The code and dataset are available at https://github.com/UbiquitousLearning/DroidCall.
Related papers
- PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training [6.827011856777674]
Existing small language models (SLM) for on-device deployment don't consider device hardware characteristics.
This work presents a simple yet effective principle for SLM design: architecture searching for (near-)optimal runtime efficiency before pre-training.
We develop PhoneLM family (currently with 0.5B and 1.5B versions), that acheive the state-of-the-art capability-efficiency tradeoff among those with similar parameter size.
arXiv Detail & Related papers (2024-11-07T02:19:00Z) - ToolACE: Winning the Points of LLM Function Calling [139.07157814653638]
ToolACE is an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data.
We demonstrate that models trained on our synthesized data, even with only 8B parameters, achieve state-of-the-art performance on the Berkeley Function-Calling Leaderboard.
arXiv Detail & Related papers (2024-09-02T03:19:56Z) - AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents [50.39555842254652]
We introduce the Android Multi-annotation EXpo (AMEX) to advance research on AI agents in mobile scenarios.
AMEX comprises over 104K high-resolution screenshots from 110 popular mobile applications, which are annotated at multiple levels.
AMEX includes three levels of annotations: GUI interactive element grounding, GUI screen and element functionality descriptions, and complex natural language instructions.
arXiv Detail & Related papers (2024-07-03T17:59:58Z) - APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets [99.8988504388011]
APIGen is an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications.
We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets.
We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains.
arXiv Detail & Related papers (2024-06-26T17:49:11Z) - AutoDroid: LLM-powered Task Automation in Android [32.241570727243534]
We introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts.
The main components include a functionality-aware UI representation method that bridges the UI with the LLM.
We evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks.
arXiv Detail & Related papers (2023-08-29T13:02:30Z) - Android in the Wild: A Large-Scale Dataset for Android Device Control [4.973591165982018]
We present a dataset for device-control research, Android in the Wild (AITW)
The dataset contains human demonstrations of device interactions, including the screens and actions, and corresponding natural language instructions.
It consists of 715k episodes spanning 30k unique instructions, four versions of Android (v10-13),and eight device types (Pixel 2 XL to Pixel 6) with varying screen resolutions.
arXiv Detail & Related papers (2023-07-19T15:57:24Z) - DroidBot-GPT: GPT-powered UI Automation for Android [11.980924738484994]
DroidBot-GPT is a tool that utilizes GPT-like large language models (LLMs) to automate the interactions with Android mobile applications.
Given a natural language description of a desired task, DroidBot-GPT can automatically generate and execute actions that navigate the app to complete the task.
arXiv Detail & Related papers (2023-04-14T11:31:56Z) - Z-BERT-A: a zero-shot Pipeline for Unknown Intent detection [3.3135037978828263]
We propose Zero-Shot-BERT-Adapters, a two-stage method for multilingual intent discovery relying on a Transformer architecture.
We train the model for Natural Language Inference (NLI) and later perform unknown intent classification in a zero-shot setting for multiple languages.
We show how Zero-Shot-BERT-Adapters outperforms various baselines in two zero-shot settings: known intent classification and unseen intent discovery.
arXiv Detail & Related papers (2022-08-15T09:27:34Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - Universal Phone Recognition with a Multilingual Allophone System [135.2254086165086]
We propose a joint model of language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute.
Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
arXiv Detail & Related papers (2020-02-26T21:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.