A Sign Language Recognition System with Pepper, Lightweight-Transformer,
and LLM
- URL: http://arxiv.org/abs/2309.16898v1
- Date: Thu, 28 Sep 2023 23:54:41 GMT
- Title: A Sign Language Recognition System with Pepper, Lightweight-Transformer,
and LLM
- Authors: JongYoon Lim, Inkyu Sa, Bruce MacDonald, and Ho Seok Ahn
- Abstract summary: This research explores using lightweight deep neural network architectures to enable the humanoid robot Pepper to understand American Sign Language (ASL)
We introduce a lightweight and efficient model for ASL understanding optimized for embedded systems, ensuring rapid sign recognition while conserving computational resources.
We tailor interactions to allow the Pepper Robot to generate natural Co-Speech Gesture responses, laying the foundation for more organic and intuitive humanoid-robot dialogues.
- Score: 0.9775599530257609
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This research explores using lightweight deep neural network architectures to
enable the humanoid robot Pepper to understand American Sign Language (ASL) and
facilitate non-verbal human-robot interaction. First, we introduce a
lightweight and efficient model for ASL understanding optimized for embedded
systems, ensuring rapid sign recognition while conserving computational
resources. Building upon this, we employ large language models (LLMs) for
intelligent robot interactions. Through intricate prompt engineering, we tailor
interactions to allow the Pepper Robot to generate natural Co-Speech Gesture
responses, laying the foundation for more organic and intuitive humanoid-robot
dialogues. Finally, we present an integrated software pipeline, embodying
advancements in a socially aware AI interaction model. Leveraging the Pepper
Robot's capabilities, we demonstrate the practicality and effectiveness of our
approach in real-world scenarios. The results highlight a profound potential
for enhancing human-robot interaction through non-verbal interactions, bridging
communication gaps, and making technology more accessible and understandable.
Related papers
- $π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge.
We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - LPAC: Learnable Perception-Action-Communication Loops with Applications
to Coverage Control [80.86089324742024]
We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem.
CNN processes localized perception; a graph neural network (GNN) facilitates robot communications.
Evaluations show that the LPAC models outperform standard decentralized and centralized coverage control algorithms.
arXiv Detail & Related papers (2024-01-10T00:08:00Z) - Exploring Large Language Models to Facilitate Variable Autonomy for Human-Robot Teaming [4.779196219827508]
We introduce a novel framework for a GPT-powered multi-robot testbed environment, based on a Unity Virtual Reality (VR) setting.
This system allows users to interact with robot agents through natural language, each powered by individual GPT cores.
A user study with 12 participants explores the effectiveness of GPT-4 and, more importantly, user strategies when being given the opportunity to converse in natural language within a multi-robot environment.
arXiv Detail & Related papers (2023-12-12T12:26:48Z) - Large Language Models for Robotics: A Survey [40.76581696885846]
Large language models (LLMs) possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots.
This review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and path planning.
arXiv Detail & Related papers (2023-11-13T10:46:35Z) - A Human-Robot Mutual Learning System with Affect-Grounded Language
Acquisition and Differential Outcomes Training [0.1812164955222814]
The paper presents a novel human-robot interaction setup for identifying robot homeostatic needs.
We adopted a differential outcomes training protocol whereby the robot provides feedback specific to its internal needs.
We found evidence that DOT can enhance the human's learning efficiency, which in turn enables more efficient robot language acquisition.
arXiv Detail & Related papers (2023-10-20T09:41:31Z) - Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models [23.945922720555146]
We propose a system to achieve incremental learning of complex behavior from natural interaction.
We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6.
arXiv Detail & Related papers (2023-09-08T13:29:05Z) - WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model [92.90127398282209]
This paper investigates the potential of integrating the most recent Large Language Models (LLMs) and existing visual grounding and robotic grasping system.
We introduce the WALL-E (Embodied Robotic WAiter load lifting with Large Language model) as an example of this integration.
We deploy this LLM-empowered system on the physical robot to provide a more user-friendly interface for the instruction-guided grasping task.
arXiv Detail & Related papers (2023-08-30T11:35:21Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - A MultiModal Social Robot Toward Personalized Emotion Interaction [1.2183405753834562]
This study demonstrates a multimodal human-robot interaction (HRI) framework with reinforcement learning to enhance the robotic interaction policy.
The goal is to apply this framework in social scenarios that can let the robots generate a more natural and engaging HRI framework.
arXiv Detail & Related papers (2021-10-08T00:35:44Z) - Self-supervised reinforcement learning for speaker localisation with the
iCub humanoid robot [58.2026611111328]
Looking at a person's face is one of the mechanisms that humans rely on when it comes to filtering speech in noisy environments.
Having a robot that can look toward a speaker could benefit ASR performance in challenging environments.
We propose a self-supervised reinforcement learning-based framework inspired by the early development of humans.
arXiv Detail & Related papers (2020-11-12T18:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.