What Matters to You? Towards Visual Representation Alignment for Robot
Learning
- URL: http://arxiv.org/abs/2310.07932v2
- Date: Tue, 16 Jan 2024 04:39:59 GMT
- Title: What Matters to You? Towards Visual Representation Alignment for Robot
Learning
- Authors: Ran Tian, Chenfeng Xu, Masayoshi Tomizuka, Jitendra Malik, Andrea
Bajcsy
- Abstract summary: When operating in service of people, robots need to optimize rewards aligned with end-user preferences.
We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
- Score: 81.30964736676103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When operating in service of people, robots need to optimize rewards aligned
with end-user preferences. Since robots will rely on raw perceptual inputs like
RGB images, their rewards will inevitably use visual representations. Recently
there has been excitement in using representations from pre-trained visual
models, but key to making these work in robotics is fine-tuning, which is
typically done via proxy tasks like dynamics prediction or enforcing temporal
cycle-consistency. However, all these proxy tasks bypass the human's input on
what matters to them, exacerbating spurious correlations and ultimately leading
to robot behaviors that are misaligned with user preferences. In this work, we
propose that robots should leverage human feedback to align their visual
representations with the end-user and disentangle what matters for the task. We
propose Representation-Aligned Preference-based Learning (RAPL), a method for
solving the visual representation alignment problem and visual reward learning
problem through the lens of preference-based learning and optimal transport.
Across experiments in X-MAGICAL and in robotic manipulation, we find that
RAPL's reward consistently generates preferred robot behaviors with high sample
efficiency, and shows strong zero-shot generalization when the visual
representation is learned from a different embodiment than the robot's.
Related papers
- Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets [24.77850617214567]
We propose a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks.
Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions.
We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss.
arXiv Detail & Related papers (2024-10-29T17:58:13Z) - HRP: Human Affordances for Robotic Pre-Training [15.92416819748365]
We present a framework for pre-training representations on hand, object, and contact.
We experimentally demonstrate (using 3000+ robot trials) that this affordance pre-training scheme boosts performance by a minimum of 15% on 5 real-world tasks.
arXiv Detail & Related papers (2024-07-26T17:59:52Z) - Predicting Human Impressions of Robot Performance During Navigation Tasks [8.01980632893357]
We investigate the possibility of predicting people's impressions of robot behavior using non-verbal behavioral cues and machine learning techniques.
Results suggest that facial expressions alone provide useful information about human impressions of robot performance.
Supervised learning techniques showed promise because they outperformed humans' predictions of robot performance in most cases.
arXiv Detail & Related papers (2023-10-17T21:12:32Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Aligning Robot and Human Representations [50.070982136315784]
We argue that current representation learning approaches in robotics should be studied from the perspective of how well they accomplish the objective of representation alignment.
We mathematically define the problem, identify its key desiderata, and situate current methods within this formalism.
arXiv Detail & Related papers (2023-02-03T18:59:55Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Learning User-Preferred Mappings for Intuitive Robot Control [28.183430654834307]
We propose a method for learning the human's preferred or preconceived mapping from a few robot queries.
We make this approach data-efficient by recognizing that human mappings have strong priors.
Our simulated and experimental results suggest that learning the mapping between inputs and robot actions improves objective and subjective performance.
arXiv Detail & Related papers (2020-07-22T18:54:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.