Variational Meta Reinforcement Learning for Social Robotics
- URL: http://arxiv.org/abs/2206.03211v4
- Date: Thu, 3 Aug 2023 15:34:05 GMT
- Title: Variational Meta Reinforcement Learning for Social Robotics
- Authors: Anand Ballou, Xavier Alameda-Pineda, Chris Reinke
- Abstract summary: Social robotics still faces many challenges.
One bottleneck is that robotic behaviors need to be often adapted as social norms depend strongly on the environment.
In this work, we investigate meta-reinforcement learning (meta-RL) as a potential solution.
- Score: 15.754961709819938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing presence of robots in our every-day environments,
improving their social skills is of utmost importance. Nonetheless, social
robotics still faces many challenges. One bottleneck is that robotic behaviors
need to be often adapted as social norms depend strongly on the environment.
For example, a robot should navigate more carefully around patients in a
hospital compared to workers in an office. In this work, we investigate
meta-reinforcement learning (meta-RL) as a potential solution. Here, robot
behaviors are learned via reinforcement learning where a reward function needs
to be chosen so that the robot learns an appropriate behavior for a given
environment. We propose to use a variational meta-RL procedure that quickly
adapts the robots' behavior to new reward functions. As a result, given a new
environment different reward functions can be quickly evaluated and an
appropriate one selected. The procedure learns a vectorized representation for
reward functions and a meta-policy that can be conditioned on such a
representation. Given observations from a new reward function, the procedure
identifies its representation and conditions the meta-policy to it. While
investigating the procedures' capabilities, we realized that it suffers from
posterior collapse where only a subset of the dimensions in the representation
encode useful information resulting in a reduced performance. Our second
contribution, a radial basis function (RBF) layer, partially mitigates this
negative effect. The RBF layer lifts the representation to a higher dimensional
space, which is more easily exploitable for the meta-policy. We demonstrate the
interest of the RBF layer and the usage of meta-RL for social robotics on four
robotic simulation tasks.
Related papers
- RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation [90.81956345363355]
RoBridge is a hierarchical intelligent architecture for general robotic manipulation.<n>It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM)<n>It unleashes the procedural skill of reinforcement learning, effectively bridging the gap between cognition and execution.
arXiv Detail & Related papers (2025-05-03T06:17:18Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - What Matters to You? Towards Visual Representation Alignment for Robot
Learning [81.30964736676103]
When operating in service of people, robots need to optimize rewards aligned with end-user preferences.
We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
arXiv Detail & Related papers (2023-10-11T23:04:07Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Quality-Diversity Optimisation on a Physical Robot Through
Dynamics-Aware and Reset-Free Learning [4.260312058817663]
We build upon the Reset-Free QD (RF-QD) algorithm to learn controllers directly on a physical robot.
This method uses a dynamics model, learned from interactions between the robot and the environment, to predict the robot's behaviour.
RF-QD also includes a recovery policy that returns the robot to a safe zone when it has walked outside of it, allowing continuous learning.
arXiv Detail & Related papers (2023-04-24T13:24:00Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Learning Latent Representations to Co-Adapt to Humans [12.71953776723672]
Non-stationary humans are challenging for robot learners.
In this paper we introduce an algorithmic formalism that enables robots to co-adapt alongside dynamic humans.
arXiv Detail & Related papers (2022-12-19T16:19:24Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - REvolveR: Continuous Evolutionary Models for Robot-to-robot Policy
Transfer [57.045140028275036]
We consider the problem of transferring a policy across two different robots with significantly different parameters such as kinematics and morphology.
Existing approaches that train a new policy by matching the action or state transition distribution, including imitation learning methods, fail due to optimal action and/or state distribution being mismatched in different robots.
We propose a novel method named $REvolveR$ of using continuous evolutionary models for robotic policy transfer implemented in a physics simulator.
arXiv Detail & Related papers (2022-02-10T18:50:25Z) - A General, Evolution-Inspired Reward Function for Social Robotics [0.0]
We present the Social Reward Function as a mechanism to provide a real-time, dense reward function necessary for the deployment of reinforcement learning agents in social robotics.
The Social Reward Function is designed to closely mimic those genetically endowed social perception capabilities of humans in an effort to provide a simple, stable and culture-agnostic reward function.
arXiv Detail & Related papers (2022-02-01T18:05:31Z) - Feature Expansive Reward Learning: Rethinking Human Input [31.413656752926208]
We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not.
We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function.
arXiv Detail & Related papers (2020-06-23T17:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.