Intelligent Virtual Assistants with LLM-based Process Automation
- URL: http://arxiv.org/abs/2312.06677v1
- Date: Mon, 4 Dec 2023 07:51:58 GMT
- Title: Intelligent Virtual Assistants with LLM-based Process Automation
- Authors: Yanchu Guan, Dong Wang, Zhixuan Chu, Shiyu Wang, Feiyue Ni, Ruihua
Song, Longfei Li, Jinjie Gu, Chenyi Zhuang
- Abstract summary: This paper proposes a novel LLM-based virtual assistant that can automatically perform multi-step operations within mobile apps based on high-level user requests.
The system represents an advance in assistants by providing an end-to-end solution for parsing instructions, reasoning about goals, and executing actions.
- Score: 31.275267197246595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While intelligent virtual assistants like Siri, Alexa, and Google Assistant
have become ubiquitous in modern life, they still face limitations in their
ability to follow multi-step instructions and accomplish complex goals
articulated in natural language. However, recent breakthroughs in large
language models (LLMs) show promise for overcoming existing barriers by
enhancing natural language processing and reasoning capabilities. Though
promising, applying LLMs to create more advanced virtual assistants still faces
challenges like ensuring robust performance and handling variability in
real-world user commands. This paper proposes a novel LLM-based virtual
assistant that can automatically perform multi-step operations within mobile
apps based on high-level user requests. The system represents an advance in
assistants by providing an end-to-end solution for parsing instructions,
reasoning about goals, and executing actions. LLM-based Process Automation
(LLMPA) has modules for decomposing instructions, generating descriptions,
detecting interface elements, predicting next actions, and error checking.
Experiments demonstrate the system completing complex mobile operation tasks in
Alipay based on natural language instructions. This showcases how large
language models can enable automated assistants to accomplish real-world tasks.
The main contributions are the novel LLMPA architecture optimized for app
process automation, the methodology for applying LLMs to mobile apps, and
demonstrations of multi-step task completion in a real-world environment.
Notably, this work represents the first real-world deployment and extensive
evaluation of a large language model-based virtual assistant in a widely used
mobile application with an enormous user base numbering in the hundreds of
millions.
Related papers
- ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts.
We leverage natural language prompts and contextual information from the Robot Operating System (ROS)
Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z) - Natural Language as Policies: Reasoning for Coordinate-Level Embodied Control with LLMs [7.746160514029531]
We demonstrate experimental results with LLMs that address robotics task planning problems.
Our approach acquires text descriptions of the task and scene objects, then formulates task planning through natural language reasoning.
Our approach is evaluated on a multi-modal prompt simulation benchmark.
arXiv Detail & Related papers (2024-03-20T17:58:12Z) - Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected
Multi-Modal Large Models [76.99140362751787]
We present NuInstruct, a novel dataset with 91K multi-view video-QA pairs across 17 subtasks.
We also present BEV-InMLLM, an end-to-end method for efficiently deriving instruction-aware Bird's-Eye-View features.
arXiv Detail & Related papers (2024-01-02T01:54:22Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - AutoTAMP: Autoregressive Task and Motion Planning with LLMs as Translators and Checkers [20.857692296678632]
For effective human-robot interaction, robots need to understand, plan, and execute complex, long-horizon tasks.
Recent advances in large language models have shown promise for translating natural language into robot action sequences.
We show that our approach outperforms several methods using LLMs as planners in complex task domains.
arXiv Detail & Related papers (2023-06-10T21:58:29Z) - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks.
Our approach is adjustable and flexible in accommodating various instruction modalities and input types.
Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z) - Chat with the Environment: Interactive Multimodal Perception Using Large
Language Models [19.623070762485494]
Large Language Models (LLMs) have shown remarkable reasoning ability in few-shot robotic planning.
Our study demonstrates that LLMs can provide high-level planning and reasoning skills and control interactive robot behavior in a multimodal environment.
arXiv Detail & Related papers (2023-03-14T23:01:27Z) - Prompting Is Programming: A Query Language for Large Language Models [5.8010446129208155]
We present the novel idea of Language Model Programming (LMP)
LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting.
We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way.
arXiv Detail & Related papers (2022-12-12T18:09:09Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.