Learning Visually Guided Latent Actions for Assistive Teleoperation
- URL: http://arxiv.org/abs/2105.00580v1
- Date: Sun, 2 May 2021 23:58:28 GMT
- Title: Learning Visually Guided Latent Actions for Assistive Teleoperation
- Authors: Siddharth Karamcheti, Albert J. Zhai, Dylan P. Losey, Dorsa Sadigh
- Abstract summary: We develop assistive robots that condition their latent embeddings on visual inputs.
We show that incorporating object detectors pretrained on small amounts of cheap, easy-to-collect structured data enables i) accurately recognizing the current context and ii) generalizing control embeddings to new objects and tasks.
- Score: 9.75385535829762
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is challenging for humans -- particularly those living with physical
disabilities -- to control high-dimensional, dexterous robots. Prior work
explores learning embedding functions that map a human's low-dimensional inputs
(e.g., via a joystick) to complex, high-dimensional robot actions for assistive
teleoperation; however, a central problem is that there are many more
high-dimensional actions than available low-dimensional inputs. To extract the
correct action and maximally assist their human controller, robots must reason
over their context: for example, pressing a joystick down when interacting with
a coffee cup indicates a different action than when interacting with knife. In
this work, we develop assistive robots that condition their latent embeddings
on visual inputs. We explore a spectrum of visual encoders and show that
incorporating object detectors pretrained on small amounts of cheap,
easy-to-collect structured data enables i) accurately and robustly recognizing
the current context and ii) generalizing control embeddings to new objects and
tasks. In user studies with a high-dimensional physical robot arm, participants
leverage this approach to perform new tasks with unseen objects. Our results
indicate that structured visual representations improve few-shot performance
and are subjectively preferred by users.
Related papers
- Zero-Cost Whole-Body Teleoperation for Mobile Manipulation [8.71539730969424]
MoMa-Teleop is a novel teleoperation method that delegates the base motions to a reinforcement learning agent.
We demonstrate that our approach results in a significant reduction in task completion time across a variety of robots and tasks.
arXiv Detail & Related papers (2024-09-23T15:09:45Z) - Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots.
It enables human operators to share control of a robot end-effector with a learned assistive agent.
It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality
in Human-Robot Interaction [3.1473798197405953]
This dissertation aims to teach a robot unknown objects in the context of Human-Robot Interaction (HRI)
The combination of eye tracking and Augmented Reality created a powerful synergy that empowered the human teacher to communicate with the robot.
The robot's object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets.
arXiv Detail & Related papers (2023-12-12T11:34:43Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly [18.563562557565483]
We propose the framework "Scene Editing as Teleoperation" (SEaT)
Instead of controlling the robot, users focus on specifying the task's goal.
A user can perform teleoperation without any expert knowledge of the robot hardware.
arXiv Detail & Related papers (2021-10-09T04:22:21Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z) - Careful with That! Observation of Human Movements to Estimate Objects
Properties [106.925705883949]
We focus on the features of human motor actions that communicate insights on the weight of an object.
Our final goal is to enable a robot to autonomously infer the degree of care required in object handling.
arXiv Detail & Related papers (2021-03-02T08:14:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.