Maximum entropy exploration in contextual bandits with neural networks
and energy based models
- URL: http://arxiv.org/abs/2210.06302v1
- Date: Wed, 12 Oct 2022 15:09:45 GMT
- Title: Maximum entropy exploration in contextual bandits with neural networks
and energy based models
- Authors: Adam Elwood, Marco Leonardi, Ashraf Mohamed, Alessandro Rozza
- Abstract summary: We present two classes of models, one with neural networks as reward estimators, and the other with energy based models.
We show that both techniques outperform well-known standard algorithms, where energy based models have the best overall performance.
This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
- Score: 63.872634680339644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contextual bandits can solve a huge range of real-world problems. However,
current popular algorithms to solve them either rely on linear models, or
unreliable uncertainty estimation in non-linear models, which are required to
deal with the exploration-exploitation trade-off. Inspired by theories of human
cognition, we introduce novel techniques that use maximum entropy exploration,
relying on neural networks to find optimal policies in settings with both
continuous and discrete action spaces. We present two classes of models, one
with neural networks as reward estimators, and the other with energy based
models, which model the probability of obtaining an optimal reward given an
action. We evaluate the performance of these models in static and dynamic
contextual bandit simulation environments. We show that both techniques
outperform well-known standard algorithms, where energy based models have the
best overall performance. This provides practitioners with new techniques that
perform well in static and dynamic settings, and are particularly well suited
to non-linear scenarios with continuous action spaces.
Related papers
- Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - CoDBench: A Critical Evaluation of Data-driven Models for Continuous
Dynamical Systems [8.410938527671341]
We introduce CodBench, an exhaustive benchmarking suite comprising 11 state-of-the-art data-driven models for solving differential equations.
Specifically, we evaluate 4 distinct categories of models, viz., feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures.
We conduct extensive experiments, assessing the operators' capabilities in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency.
arXiv Detail & Related papers (2023-10-02T21:27:54Z) - A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL)
The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics.
Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
arXiv Detail & Related papers (2023-09-15T17:37:09Z) - Optimistic Active Exploration of Dynamical Systems [52.91573056896633]
We develop an algorithm for active exploration called OPAX.
We show how OPAX can be reduced to an optimal control problem that can be solved at each episode.
Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.
arXiv Detail & Related papers (2023-06-21T16:26:59Z) - Neural Abstractions [72.42530499990028]
We present a novel method for the safety verification of nonlinear dynamical models that uses neural networks to represent abstractions of their dynamics.
We demonstrate that our approach performs comparably to the mature tool Flow* on existing benchmark nonlinear models.
arXiv Detail & Related papers (2023-01-27T12:38:09Z) - Model Generation with Provable Coverability for Offline Reinforcement
Learning [14.333861814143718]
offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization.
But due to the limitation under the offline setting, the learned model could not mimic real dynamics well enough to support reliable out-of-distribution exploration.
We propose an algorithm to generate models optimizing their coverage for the real dynamics.
arXiv Detail & Related papers (2022-06-01T08:34:09Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z) - Sparse Flows: Pruning Continuous-depth Models [107.98191032466544]
We show that pruning improves generalization for neural ODEs in generative modeling.
We also show that pruning finds minimal and efficient neural ODE representations with up to 98% less parameters compared to the original network, without loss of accuracy.
arXiv Detail & Related papers (2021-06-24T01:40:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.