Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task Automation
- URL: http://arxiv.org/abs/2312.03003v2
- Date: Sat, 16 Mar 2024 06:17:52 GMT
- Title: Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task Automation
- Authors: Sunjae Lee, Junyoung Choi, Jungjae Lee, Munim Hasan Wasi, Hojun Choi, Steven Y. Ko, Sangeun Oh, Insik Shin,
- Abstract summary: This paper introduces MobileGPT, an innovative mobile task automator equipped with a human-like app memory.
MobileGPT emulates the cognitive process of humans interacting with a mobile app -- explore, select, derive, and recall.
We implement MobileGPT using online LLMs services (GPT-3.5 and GPT-4) and evaluate its performance on a dataset of 160 user instructions across 8 widely used mobile apps.
- Score: 8.158152532619576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of large language models (LLMs) has opened up new opportunities in the field of mobile task automation. Their superior language understanding and reasoning capabilities allow users to automate complex and repetitive tasks. However, due to the inherent unreliability and high operational cost of LLMs, their practical applicability is quite limited. To address these issues, this paper introduces MobileGPT, an innovative LLM-based mobile task automator equipped with a human-like app memory. MobileGPT emulates the cognitive process of humans interacting with a mobile app -- explore, select, derive, and recall. This approach allows for a more precise and efficient learning of a task's procedure by breaking it down into smaller, modular sub-tasks that can be re-used, re-arranged, and adapted for various objectives. We implement MobileGPT using online LLMs services (GPT-3.5 and GPT-4) and evaluate its performance on a dataset of 160 user instructions across 8 widely used mobile apps. The results indicate that MobileGPT can automate and learn new tasks with 82.5% accuracy, and is able to adapt them to different contexts with near perfect (98.75%) accuracy while reducing both latency and cost by 62.5% and 68.8%, respectively, compared to the GPT-4 powered baseline.
Related papers
- Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning [61.294110816231886]
We introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP)
SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model.
Demos and codes can be found in https://forrest-110.io/sparse_diffusion_policy/.
arXiv Detail & Related papers (2024-07-01T17:59:56Z) - GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment [74.40196814292426]
We introduce a novel and intuitive Guidance-based Knowledge Transfer (GKT) framework.
GKT uses a larger Large Language Models as a ''teacher'' to create guidance prompts, paired with a smaller ''student'' model to finalize responses.
It achieves a maximum accuracy improvement of 14.18%, along with a 10.72 times speed-up on GSM8K and an accuracy improvement of 14.00 % along with a 7.73 times speed-up in CSQA.
arXiv Detail & Related papers (2024-05-30T02:37:35Z) - Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection [11.509880721677156]
We propose a pre-trained LLM-empowered framework to perform fully automatic network intrusion detection.
With experiments on a real network intrusion detection dataset, in-context learning proves to be highly beneficial.
We show that for GPT-4, testing accuracy and F1-Score can be improved by 90%.
arXiv Detail & Related papers (2024-05-17T02:56:31Z) - Octopus v2: On-device language model for super agent [10.998608318944985]
Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency.
When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold.
arXiv Detail & Related papers (2024-04-02T09:01:32Z) - MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT [87.4910758026772]
"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development.
This paper explores the "less is more" paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices.
arXiv Detail & Related papers (2024-02-26T18:59:03Z) - When Large Language Model Agents Meet 6G Networks: Perception,
Grounding, and Alignment [100.58938424441027]
We propose a split learning system for AI agents in 6G networks leveraging the collaboration between mobile devices and edge servers.
We introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context.
arXiv Detail & Related papers (2024-01-15T15:20:59Z) - GenSim: Generating Robotic Simulation Tasks via Large Language Models [34.79613485106202]
GenSim aims to automatically generate rich simulation environments and expert demonstrations.
We use GPT4 to expand the existing benchmark by ten times to over 100 tasks.
With minimal sim-to-real adaptation, multitask policies pretrained on GPT4-generated simulation tasks exhibit stronger transfer to unseen long-horizon tasks in the real world.
arXiv Detail & Related papers (2023-10-02T17:23:48Z) - Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT
LLM on Mobile [0.0]
This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity.
The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory.
Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions features.
arXiv Detail & Related papers (2023-09-29T16:30:49Z) - AutoDroid: LLM-powered Task Automation in Android [32.241570727243534]
We introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts.
The main components include a functionality-aware UI representation method that bridges the UI with the LLM.
We evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks.
arXiv Detail & Related papers (2023-08-29T13:02:30Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - Error-Aware Imitation Learning from Teleoperation Data for Mobile
Manipulation [54.31414116478024]
In mobile manipulation (MM), robots can both navigate within and interact with their environment.
In this work, we explore how to apply imitation learning (IL) to learn continuous visuo-motor policies for MM tasks.
arXiv Detail & Related papers (2021-12-09T23:54:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.