Related papers: Towards an LLM-Based Speech Interface for Robot-Assisted Feeding

Towards an LLM-Based Speech Interface for Robot-Assisted Feeding

URL: http://arxiv.org/abs/2410.20624v1
Date: Sun, 27 Oct 2024 22:56:51 GMT
Title: Towards an LLM-Based Speech Interface for Robot-Assisted Feeding
Authors: Jessie Yuan, Janavi Gupta, Akhil Padmanabha, Zulekha Karachiwalla, Carmel Majidi, Henny Admoni, Zackory Erickson,
Abstract summary: Speech interfaces that utilize Large Language Models (LLMs) can enable individuals to communicate high-level commands and nuanced preferences to robots. In this work, we demonstrate an LLM-based speech interface for a commercially available assistive feeding robot.
Score: 9.528060348251584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Physically assistive robots present an opportunity to significantly increase the well-being and independence of individuals with motor impairments or other forms of disability who are unable to complete activities of daily living (ADLs). Speech interfaces, especially ones that utilize Large Language Models (LLMs), can enable individuals to effectively and naturally communicate high-level commands and nuanced preferences to robots. In this work, we demonstrate an LLM-based speech interface for a commercially available assistive feeding robot. Our system is based on an iteratively designed framework, from the paper "VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots," that incorporates human-centric elements for integrating LLMs as interfaces for robots. It has been evaluated through a user study with 11 older adults at an independent living facility. Videos are located on our project website: https://sites.google.com/andrew.cmu.edu/voicepilot/.

Related papers

Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities [65.98704516122228]
The integration of foundation models (FMs) into robotics has enabled robots to understand natural language and reason about the semantics in their environments.<n>This paper addresses the deployment of FM-enabled robots in the field, where missions often require a robot to operate in large-scale and unstructured environments.<n>We present the first demonstration of large-scale LLM-enabled robot planning in unstructured environments with several kilometers of missions.
arXiv Detail & Related papers (2025-05-14T15:28:43Z)
A Paragraph is All It Takes: Rich Robot Behaviors from Interacting, Trusted LLMs [2.4866349670733294]
Large Language Models (LLMs) are compact representations of all public knowledge of our physical environment and animal and human behaviors. We show that rich robot behaviors and good performance could be achieved despite the robot's data fusion cycle running at only 1Hz. The use of natural language for inter-LLM communication allowed the robot's reasoning and decision making to be directly observed by humans. We suggest that by using natural language as the data bus among interacting AIs, and immutable public ledgers to store behavior constraints, it is possible to build robots that combine unexpectedly rich performance, upgradability, and
arXiv Detail & Related papers (2024-12-24T18:41:15Z)
Interpreting and learning voice commands with a Large Language Model for a robot system [0.0]
The use of Large Language Models (LLMs) like GPT-4 has enhanced robot capabilities, allowing for real-time interaction and decision-making. This project focuses on merging LLMs with databases to improve decision-making and enable knowledge acquisition for request interpretation problems.
arXiv Detail & Related papers (2024-07-31T10:30:31Z)
LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction. Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z)
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [56.505551117094534]
We introduce LLaRA: Large Language and Robotics Assistant, a framework that formulates robot action policy as visuo-textual conversations. First, we present an automated pipeline to generate conversation-style instruction tuning data for robots from existing behavior cloning datasets. We show that a VLM finetuned with a limited amount of such datasets can produce meaningful action decisions for robotic control.
arXiv Detail & Related papers (2024-06-28T17:59:12Z)
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration [4.2460673279562755]
Large Language Models (LLMs) are gaining popularity in the field of robotics. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC) The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot.
arXiv Detail & Related papers (2024-06-20T08:23:49Z)
VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots [9.528060348251584]
Speech interfaces that utilize Large Language Models (LLMs) can enable individuals to communicate high-level commands and nuanced preferences to robots. Frameworks for integrating LLMs as interfaces to robots for high level task planning and code generation have been proposed, but fail to incorporate human-centric considerations. We present a framework for incorporating LLMs as speech interfaces for physically assistive robots, constructed iteratively with 3 stages of testing involving a feeding robot, culminating in an evaluation with 11 older adults at an independent living facility.
arXiv Detail & Related papers (2024-04-05T12:45:10Z)
Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru [9.2526849536751]
We introduce a fully-automated conversation system that leverages large language models (LLMs) to generate robot responses with expressive behaviors. We conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts. Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations.
arXiv Detail & Related papers (2024-02-18T12:35:52Z)
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z)
Large Language Models for Robotics: A Survey [40.76581696885846]
Large language models (LLMs) possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. This review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning.
arXiv Detail & Related papers (2023-11-13T10:46:35Z)
Vision-Language Foundation Models as Effective Robot Imitators [48.73027330407576]
We derive a vision-language manipulation framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo. By exceeding the state-of-the-art performance with a large margin on the tested benchmark, we show RoboFlamingo can be an effective and competitive alternative to adapt VLMs to robot control.
arXiv Detail & Related papers (2023-11-02T16:34:33Z)
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system. We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration. We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z)
LLM as A Robotic Brain: Unifying Egocentric Memory and Control [77.0899374628474]
Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. We propose a novel framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control.
arXiv Detail & Related papers (2023-04-19T00:08:48Z)
Self-supervised reinforcement learning for speaker localisation with the iCub humanoid robot [58.2026611111328]
Looking at a person's face is one of the mechanisms that humans rely on when it comes to filtering speech in noisy environments. Having a robot that can look toward a speaker could benefit ASR performance in challenging environments. We propose a self-supervised reinforcement learning-based framework inspired by the early development of humans.
arXiv Detail & Related papers (2020-11-12T18:02:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.