Batch Active Learning of Reward Functions from Human Preferences
- URL: http://arxiv.org/abs/2402.15757v1
- Date: Sat, 24 Feb 2024 08:07:48 GMT
- Title: Batch Active Learning of Reward Functions from Human Preferences
- Authors: Erdem B{\i}y{\i}k, Nima Anari, Dorsa Sadigh
- Abstract summary: Preference-based learning enables reliable labeling by querying users with preference questions.
Active querying methods are commonly employed in preference-based learning to generate more informative data.
We develop a set of novel algorithms that enable efficient learning of reward functions using as few data samples as possible.
- Score: 33.39413552270375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data generation and labeling are often expensive in robot learning.
Preference-based learning is a concept that enables reliable labeling by
querying users with preference questions. Active querying methods are commonly
employed in preference-based learning to generate more informative data at the
expense of parallelization and computation time. In this paper, we develop a
set of novel algorithms, batch active preference-based learning methods, that
enable efficient learning of reward functions using as few data samples as
possible while still having short query generation times and also retaining
parallelizability. We introduce a method based on determinantal point processes
(DPP) for active batch generation and several heuristic-based alternatives.
Finally, we present our experimental results for a variety of robotics tasks in
simulation. Our results suggest that our batch active learning algorithm
requires only a few queries that are computed in a short amount of time. We
showcase one of our algorithms in a study to learn human users' preferences.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Cache & Distil: Optimising API Calls to Large Language Models [82.32065572907125]
Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries.
To curtail the frequency of these calls, one can employ a smaller language model -- a student.
This student gradually gains proficiency in independently handling an increasing number of user requests.
arXiv Detail & Related papers (2023-10-20T15:01:55Z) - Novel Batch Active Learning Approach and Its Application to Synthetic
Aperture Radar Datasets [7.381841249558068]
Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005.
We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling.
The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size.
arXiv Detail & Related papers (2023-07-19T23:25:21Z) - Algorithm Selection for Deep Active Learning with Imbalanced Datasets [11.902019233549474]
Active learning aims to reduce the number of labeled examples needed to train deep networks.
It is difficult to know in advance which active learning strategy will perform well or best in a given application.
We propose the first adaptive algorithm selection strategy for deep active learning.
arXiv Detail & Related papers (2023-02-14T19:59:49Z) - Benchmarking Learning Efficiency in Deep Reservoir Computing [23.753943709362794]
We introduce a benchmark of increasingly difficult tasks together with a data efficiency metric to measure how quickly machine learning models learn from training data.
We compare the learning speed of some established sequential supervised models, such as RNNs, LSTMs, or Transformers, with relatively less known alternative models based on reservoir computing.
arXiv Detail & Related papers (2022-09-29T08:16:52Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Boosting the Learning for Ranking Patterns [6.142272540492935]
This paper formulates the problem of learning pattern ranking functions as a multi-criteria decision making problem.
Our approach aggregates different interestingness measures into a single weighted linear ranking function, using an interactive learning procedure.
Experiments conducted on well-known datasets show that our approach significantly reduces the running time and returns precise pattern ranking.
arXiv Detail & Related papers (2022-03-05T10:22:44Z) - Probabilistic Active Meta-Learning [15.432006404678981]
We introduce task selection based on prior experience into a meta-learning algorithm.
We provide empirical evidence that our approach improves data-efficiency when compared to strong baselines on simulated robotic experiments.
arXiv Detail & Related papers (2020-07-17T12:51:42Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Bayesian active learning for production, a systematic study and a
reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques.
We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process.
We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z) - Meta-learning with Stochastic Linear Bandits [120.43000970418939]
We consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector.
We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.
arXiv Detail & Related papers (2020-05-18T08:41:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.