GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System
- URL: http://arxiv.org/abs/2306.01741v1
- Date: Wed, 10 May 2023 10:14:16 GMT
- Title: GPT Models Meet Robotic Applications: Co-Speech Gesturing Chat System
- Authors: Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu,
Katsushi Ikeuchi
- Abstract summary: This technical paper introduces a chatting robot system that utilizes recent advancements in large-scale language models (LLMs)
The system is integrated with a co-speech gesture generation system, which selects appropriate gestures based on the conceptual meaning of speech.
- Score: 8.660929270060146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This technical paper introduces a chatting robot system that utilizes recent
advancements in large-scale language models (LLMs) such as GPT-3 and ChatGPT.
The system is integrated with a co-speech gesture generation system, which
selects appropriate gestures based on the conceptual meaning of speech. Our
motivation is to explore ways of utilizing the recent progress in LLMs for
practical robotic applications, which benefits the development of both chatbots
and LLMs. Specifically, it enables the development of highly responsive chatbot
systems by leveraging LLMs and adds visual effects to the user interface of
LLMs as an additional value. The source code for the system is available on
GitHub for our in-house robot
(https://github.com/microsoft/LabanotationSuite/tree/master/MSRAbotChatSimulation)
and GitHub for Toyota HSR
(https://github.com/microsoft/GPT-Enabled-HSR-CoSpeechGestures).
Related papers
- Large Generative Model-assisted Talking-face Semantic Communication System [55.42631520122753]
This study introduces a Large Generative Model-assisted Talking-face Semantic Communication (LGM-TSC) system.
Generative Semantic Extractor (GSE) at the transmitter converts semantically sparse talking-face videos into texts with high information density.
Private Knowledge Base (KB) based on the Large Language Model (LLM) for semantic disambiguation and correction.
Generative Semantic Reconstructor (GSR) that utilizes BERT-VITS2 and SadTalker models to transform text back into a high-QoE talking-face video.
arXiv Detail & Related papers (2024-11-06T12:45:46Z) - Towards an LLM-Based Speech Interface for Robot-Assisted Feeding [9.528060348251584]
Speech interfaces that utilize Large Language Models (LLMs) can enable individuals to communicate high-level commands and nuanced preferences to robots.
In this work, we demonstrate an LLM-based speech interface for a commercially available assistive feeding robot.
arXiv Detail & Related papers (2024-10-27T22:56:51Z) - ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts.
We leverage natural language prompts and contextual information from the Robot Operating System (ROS)
Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z) - VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots [9.528060348251584]
Speech interfaces that utilize Large Language Models (LLMs) can enable individuals to communicate high-level commands and nuanced preferences to robots.
Frameworks for integrating LLMs as interfaces to robots for high level task planning and code generation have been proposed, but fail to incorporate human-centric considerations.
We present a framework for incorporating LLMs as speech interfaces for physically assistive robots, constructed iteratively with 3 stages of testing involving a feeding robot, culminating in an evaluation with 11 older adults at an independent living facility.
arXiv Detail & Related papers (2024-04-05T12:45:10Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in
Conversations with the Tabletop Robot Haru [9.2526849536751]
We introduce a fully-automated conversation system that leverages large language models (LLMs) to generate robot responses with expressive behaviors.
We conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts.
Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations.
arXiv Detail & Related papers (2024-02-18T12:35:52Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - SpeechGen: Unlocking the Generative Power of Speech Language Models with
Prompts [108.04306136086807]
We present research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen.
The proposed unified framework holds great promise for efficiency and effectiveness, particularly with the imminent arrival of advanced speech LMs.
arXiv Detail & Related papers (2023-06-03T22:35:27Z) - InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT
Beyond Language [82.92236977726655]
InternGPT stands for textbfinteraction, textbfnonverbal, and textbfchatbots.
We present an interactive visual framework named InternGPT, or iGPT for short.
arXiv Detail & Related papers (2023-05-09T17:58:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.