Amortized nonmyopic active search via deep imitation learning
- URL: http://arxiv.org/abs/2405.15031v1
- Date: Thu, 23 May 2024 20:10:29 GMT
- Title: Amortized nonmyopic active search via deep imitation learning
- Authors: Quan Nguyen, Anindya Sarkar, Roman Garnett,
- Abstract summary: Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class.
We study the amortization of this policy by training a neural network to learn to search.
Our network, trained on synthetic data, learns a beneficial search strategy that yields nonmyopic decisions.
- Score: 16.037812098340343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active search formalizes a specialized active learning setting where the goal is to collect members of a rare, valuable class. The state-of-the-art algorithm approximates the optimal Bayesian policy in a budget-aware manner, and has been shown to achieve impressive empirical performance in previous work. However, even this approximate policy has a superlinear computational complexity with respect to the size of the search problem, rendering its application impractical in large spaces or in real-time systems where decisions must be made quickly. We study the amortization of this policy by training a neural network to learn to search. To circumvent the difficulty of learning from scratch, we appeal to imitation learning techniques to mimic the behavior of the expert, expensive-to-compute policy. Our policy network, trained on synthetic data, learns a beneficial search strategy that yields nonmyopic decisions carefully balancing exploration and exploitation. Extensive experiments demonstrate our policy achieves competitive performance at real-world tasks that closely approximates the expert's at a fraction of the cost, while outperforming cheaper baselines.
Related papers
- RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities [18.508401650991434]
Economic and environmental costs of training neural networks are becoming unsustainable.
Research on *algorithmically-efficient deep learning* seeks to reduce training costs through changes in the semantics of the training program.
We formalize the *algorithmic speedup* problem, then use fundamental building blocks of algorithmically efficient training to develop a taxonomy.
arXiv Detail & Related papers (2022-10-13T00:40:04Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Nonmyopic Multifidelity Active Search [15.689830609697685]
We propose a model of multifidelity active search, as well as a novel, computationally efficient policy for this setting.
We evaluate the performance of our solution on real-world datasets and demonstrate significantly better performance than natural benchmarks.
arXiv Detail & Related papers (2021-06-11T12:55:51Z) - Reinforcement Learning with Efficient Active Feature Acquisition [59.91808801541007]
In real-life, information acquisition might correspond to performing a medical test on a patient.
We propose a model-based reinforcement learning framework that learns an active feature acquisition policy.
Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states.
arXiv Detail & Related papers (2020-11-02T08:46:27Z) - Reannealing of Decaying Exploration Based On Heuristic Measure in Deep
Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed.
We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z) - Adversarial Imitation Learning via Random Search [15.475463516901938]
We propose an imitation learning method that takes advantage of the derivative-free optimization with simple linear policies.
Experiments in this paper show that the proposed model, without a direct reward signal from the environment, obtains competitive performance on the MuJoCo locomotion tasks.
arXiv Detail & Related papers (2020-08-21T12:40:03Z) - Learning Adaptive Exploration Strategies in Dynamic Environments Through
Informed Policy Regularization [100.72335252255989]
We study the problem of learning exploration-exploitation strategies that effectively adapt to dynamic environments.
We propose a novel algorithm that regularizes the training of an RNN-based policy using informed policies trained to maximize the reward in each task.
arXiv Detail & Related papers (2020-05-06T16:14:48Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.