Learning Multimodal Rewards from Rankings
- URL: http://arxiv.org/abs/2109.12750v1
- Date: Mon, 27 Sep 2021 01:22:01 GMT
- Title: Learning Multimodal Rewards from Rankings
- Authors: Vivek Myers, Erdem B{\i}y{\i}k, Nima Anari, Dorsa Sadigh
- Abstract summary: We go beyond learning a unimodal reward and focus on learning a multimodal reward function.
We formulate the multimodal reward learning as a mixture learning problem.
We conduct experiments and user studies using a multi-task variant of OpenAI's LunarLander and a real Fetch robot.
- Score: 7.266985088439535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from human feedback has shown to be a useful approach in acquiring
robot reward functions. However, expert feedback is often assumed to be drawn
from an underlying unimodal reward function. This assumption does not always
hold including in settings where multiple experts provide data or when a single
expert provides data for different tasks -- we thus go beyond learning a
unimodal reward and focus on learning a multimodal reward function. We
formulate the multimodal reward learning as a mixture learning problem and
develop a novel ranking-based learning approach, where the experts are only
required to rank a given set of trajectories. Furthermore, as access to
interaction data is often expensive in robotics, we develop an active querying
approach to accelerate the learning process. We conduct experiments and user
studies using a multi-task variant of OpenAI's LunarLander and a real Fetch
robot, where we collect data from multiple users with different preferences.
The results suggest that our approach can efficiently learn multimodal reward
functions, and improve data-efficiency over benchmark methods that we adapt to
our learning problem.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Learning from Guided Play: A Scheduled Hierarchical Approach for
Improving Exploration in Adversarial Imitation Learning [7.51557557629519]
We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks.
This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible.
arXiv Detail & Related papers (2021-12-16T14:58:08Z) - Single-Modal Entropy based Active Learning for Visual Question Answering [75.1682163844354]
We address Active Learning in the multi-modal setting of Visual Question Answering (VQA)
In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition.
Our novel idea is simple to implement, cost-efficient, and readily adaptable to other multi-modal tasks.
arXiv Detail & Related papers (2021-10-21T05:38:45Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously.
New tasks can be continuously instantiated from previously learned tasks.
We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z) - Learning Reward Functions from Diverse Sources of Human Feedback:
Optimally Integrating Demonstrations and Preferences [14.683631546064932]
We present a framework to integrate multiple sources of information, which are either passively or actively collected from human users.
In particular, we present an algorithm that first utilizes user demonstrations to initialize a belief about the reward function, and then actively probes the user with preference queries to zero-in on their true reward.
Our approach accounts for the human's ability to provide data: yielding user-friendly preference queries which are also theoretically optimal.
arXiv Detail & Related papers (2020-06-24T22:45:27Z) - Active Preference-Based Gaussian Process Regression for Reward Learning [42.697198807877925]
One common approach is to learn reward functions from collected expert demonstrations.
We present a preference-based learning approach, where as an alternative, the human feedback is only in the form of comparisons between trajectories.
Our approach enables us to tackle both inflexibility and data-inefficiency problems within a preference-based learning framework.
arXiv Detail & Related papers (2020-05-06T03:29:27Z) - Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection.
We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.