Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task Automation
- URL: http://arxiv.org/abs/2312.03003v2
- Date: Sat, 16 Mar 2024 06:17:52 GMT
- Title: Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task Automation
- Authors: Sunjae Lee, Junyoung Choi, Jungjae Lee, Munim Hasan Wasi, Hojun Choi, Steven Y. Ko, Sangeun Oh, Insik Shin,
- Abstract summary: This paper introduces MobileGPT, an innovative mobile task automator equipped with a human-like app memory.
MobileGPT emulates the cognitive process of humans interacting with a mobile app -- explore, select, derive, and recall.
We implement MobileGPT using online LLMs services (GPT-3.5 and GPT-4) and evaluate its performance on a dataset of 160 user instructions across 8 widely used mobile apps.
- Score: 8.158152532619576
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of large language models (LLMs) has opened up new opportunities in the field of mobile task automation. Their superior language understanding and reasoning capabilities allow users to automate complex and repetitive tasks. However, due to the inherent unreliability and high operational cost of LLMs, their practical applicability is quite limited. To address these issues, this paper introduces MobileGPT, an innovative LLM-based mobile task automator equipped with a human-like app memory. MobileGPT emulates the cognitive process of humans interacting with a mobile app -- explore, select, derive, and recall. This approach allows for a more precise and efficient learning of a task's procedure by breaking it down into smaller, modular sub-tasks that can be re-used, re-arranged, and adapted for various objectives. We implement MobileGPT using online LLMs services (GPT-3.5 and GPT-4) and evaluate its performance on a dataset of 160 user instructions across 8 widely used mobile apps. The results indicate that MobileGPT can automate and learn new tasks with 82.5% accuracy, and is able to adapt them to different contexts with near perfect (98.75%) accuracy while reducing both latency and cost by 62.5% and 68.8%, respectively, compared to the GPT-4 powered baseline.
Related papers
- Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark [45.28023118459497]
We introduce Mobile-MMLU, a large-scale benchmark dataset tailored for mobile intelligence.
It consists of 16,186 questions across 80 mobile-related fields, designed to evaluate LLM performance in realistic mobile scenarios.
A challenging subset, Mobile-MMLU-Pro, provides advanced evaluation similar in size to MMLU-Pro but significantly more difficult than our standard full set.
arXiv Detail & Related papers (2025-03-26T17:59:56Z) - EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments [11.97783742296183]
We introduce Embodied Mobile Manipulation in Open Environments (EMMOE), which requires agents to interpret user instructions and execute long-horizon everyday tasks in continuous space.
EMMOE seamlessly integrates high-level and low-level embodied tasks into a unified framework, along with three new metrics for more diverse assessment.
Furthermore, we design HomieBot, a sophisticated agent system consists of LLM with Direct Optimization Preference (DPO), light weighted navigation and manipulation models, and multiple error detection mechanisms.
arXiv Detail & Related papers (2025-03-11T16:42:36Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks [85.48034185086169]
Mobile-Agent-E is a hierarchical multi-agent framework capable of self-evolution through past experience.
Mobile-Agent-E achieves a 22% absolute improvement over previous state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-20T20:35:46Z) - Continuously Improving Mobile Manipulation with Autonomous Real-World RL [33.085671103158866]
We present a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision.
This is enabled by task-relevant autonomy, which guides exploration towards object interactions and prevents stagnation near goal states.
We demonstrate that our approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks.
arXiv Detail & Related papers (2024-09-30T17:59:50Z) - AlignBot: Aligning VLM-powered Customized Task Planning with User Reminders Through Fine-Tuning for Household Robots [44.47999496605951]
AlignBot is a novel framework designed to optimize VLM-powered customized task planning for household robots.
In domestic settings, aligning task planning with user reminders poses significant challenges due to the limited quantity, diversity, and multimodal nature of the reminders.
arXiv Detail & Related papers (2024-09-18T12:05:30Z) - GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment [74.40196814292426]
We introduce a novel and intuitive Guidance-based Knowledge Transfer (GKT) framework.
GKT uses a larger Large Language Models as a ''teacher'' to create guidance prompts, paired with a smaller ''student'' model to finalize responses.
It achieves a maximum accuracy improvement of 14.18%, along with a 10.72 times speed-up on GSM8K and an accuracy improvement of 14.00 % along with a 7.73 times speed-up in CSQA.
arXiv Detail & Related papers (2024-05-30T02:37:35Z) - Octopus v2: On-device language model for super agent [10.998608318944985]
Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency.
When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold.
arXiv Detail & Related papers (2024-04-02T09:01:32Z) - When Large Language Model Agents Meet 6G Networks: Perception,
Grounding, and Alignment [100.58938424441027]
We propose a split learning system for AI agents in 6G networks leveraging the collaboration between mobile devices and edge servers.
We introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context.
arXiv Detail & Related papers (2024-01-15T15:20:59Z) - Exposing Limitations of Language Model Agents in Sequential-Task
Compositions on the Web [74.76803612807949]
Language model agents (LMA) emerged as a promising paradigm on muti-step decision making tasks.
Despite the promise, their performance on real-world applications is still underexplored.
We show that while existing LMAs achieve 94.0% average success rate on base tasks, their performance degrades to 24.9% success rate on compositional tasks.
arXiv Detail & Related papers (2023-11-30T17:50:47Z) - Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT
LLM on Mobile [0.0]
This article presents an innovative approach to LLM inference, envisioning a future where LLMs with billions of parameters can be executed directly on mobile devices without network connectivity.
The article showcases a fine-tuned GPT LLM with 3 billion parameters that can operate smoothly on devices with as low as 4GB of memory.
Through the integration of native code and model quantization techniques, the application not only serves as a general-purpose assistant but also facilitates seamless mobile interactions with text-to-actions features.
arXiv Detail & Related papers (2023-09-29T16:30:49Z) - AutoDroid: LLM-powered Task Automation in Android [32.241570727243534]
We introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts.
The main components include a functionality-aware UI representation method that bridges the UI with the LLM.
We evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks.
arXiv Detail & Related papers (2023-08-29T13:02:30Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - Error-Aware Imitation Learning from Teleoperation Data for Mobile
Manipulation [54.31414116478024]
In mobile manipulation (MM), robots can both navigate within and interact with their environment.
In this work, we explore how to apply imitation learning (IL) to learn continuous visuo-motor policies for MM tasks.
arXiv Detail & Related papers (2021-12-09T23:54:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.