Related papers: LLM Granularity for On-the-Fly Robot Control

LLM Granularity for On-the-Fly Robot Control

URL: http://arxiv.org/abs/2406.14653v1
Date: Thu, 20 Jun 2024 18:17:48 GMT
Title: LLM Granularity for On-the-Fly Robot Control
Authors: Peng Wang, Mattia Robbiani, Zhihao Guo,
Abstract summary: In circumstances where visuals become unreliable or unavailable, can we rely solely on language to control robots? This work takes the initial steps to answer this question by: 1) evaluating the responses of assistive robots to language prompts of varying granularities; and 2) exploring the necessity and feasibility of controlling the robot on-the-fly.
Score: 3.5015824313818578
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Assistive robots have attracted significant attention due to their potential to enhance the quality of life for vulnerable individuals like the elderly. The convergence of computer vision, large language models, and robotics has introduced the `visuolinguomotor' mode for assistive robots, where visuals and linguistics are incorporated into assistive robots to enable proactive and interactive assistance. This raises the question: \textit{In circumstances where visuals become unreliable or unavailable, can we rely solely on language to control robots, i.e., the viability of the `linguomotor` mode for assistive robots?} This work takes the initial steps to answer this question by: 1) evaluating the responses of assistive robots to language prompts of varying granularities; and 2) exploring the necessity and feasibility of controlling the robot on-the-fly. We have designed and conducted experiments on a Sawyer cobot to support our arguments. A Turtlebot robot case is designed to demonstrate the adaptation of the solution to scenarios where assistive robots need to maneuver to assist. Codes will be released on GitHub soon to benefit the community.

Related papers

$π_0$: A Vision-Language-Action Flow Model for General Robot Control [77.32743739202543]
We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people, and its ability to acquire new skills via fine-tuning.
arXiv Detail & Related papers (2024-10-31T17:22:30Z)
Know your limits! Optimize the robot's behavior through self-awareness [11.021217430606042]
Recent human-robot imitation algorithms focus on following a reference human motion with high precision. We introduce a deep-learning model that anticipates the robot's performance when imitating a given reference. Our Self-AWare model (SAW) ranks potential robot behaviors based on various criteria, such as fall likelihood, adherence to the reference motion, and smoothness.
arXiv Detail & Related papers (2024-09-16T14:14:58Z)
Controlling diverse robots by inferring Jacobian fields with deep networks [48.279199537720714]
Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics.<n>We introduce a method that uses deep neural networks to map a video stream of a robot to its visuomotor Jacobian field.<n>Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot.
arXiv Detail & Related papers (2024-07-11T17:55:49Z)
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions. Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision. We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z)
HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands. Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z)
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks [5.057755436092344]
Human robot interaction is an exciting task, which aimed to guide robots following instructions from human. HuBo-VLM is proposed to tackle perception tasks associated with human robot interaction.
arXiv Detail & Related papers (2023-08-24T03:47:27Z)
Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z)
Exploring AI-enhanced Shared Control for an Assistive Robotic Arm [4.999814847776098]
In particular, we explore how Artifical Intelligence (AI) can be integrated into a shared control paradigm. In particular, we focus on the consequential requirements for the interface between human and robot.
arXiv Detail & Related papers (2023-06-23T14:19:56Z)
Open-World Object Manipulation using Pre-trained Vision-Language Models [72.87306011500084]
For robots to follow instructions from people, they must be able to connect the rich semantic information in human vocabulary. We develop a simple approach, which leverages a pre-trained vision-language model to extract object-identifying information. In a variety of experiments on a real mobile manipulator, we find that MOO generalizes zero-shot to a wide range of novel object categories and environments.
arXiv Detail & Related papers (2023-03-02T01:55:10Z)
Robots with Different Embodiments Can Express and Influence Carefulness in Object Manipulation [104.5440430194206]
This work investigates the perception of object manipulations performed with a communicative intent by two robots. We designed the robots' movements to communicate carefulness or not during the transportation of objects.
arXiv Detail & Related papers (2022-08-03T13:26:52Z)
Know Thyself: Transferable Visuomotor Control Through Robot-Awareness [22.405839096833937]
Training visuomotor robot controllers from scratch on a new robot typically requires generating large amounts of robot-specific data. We propose a "robot-aware" solution paradigm that exploits readily available robot "self-knowledge" Our experiments on tabletop manipulation tasks in simulation and on real robots demonstrate that these plug-in improvements dramatically boost the transferability of visuomotor controllers.
arXiv Detail & Related papers (2021-07-19T17:56:04Z)
Natural Language Interaction to Facilitate Mental Models of Remote Robots [0.0]
High-stakes scenarios require robot operators to have clear mental models of what the robots can and can't do. We propose that interaction with a conversational assistant, who acts as a mediator, can help the user with understanding the functionality of remote robots.
arXiv Detail & Related papers (2020-03-12T16:03:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.