STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning
- URL: http://arxiv.org/abs/2301.12038v2
- Date: Tue, 19 Sep 2023 03:21:17 GMT
- Title: STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning
- Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang,
Furong Huang, Dinesh Manocha
- Abstract summary: We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
- Score: 111.75423966239092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Directed Exploration is a crucial challenge in reinforcement learning (RL),
especially when rewards are sparse. Information-directed sampling (IDS), which
optimizes the information ratio, seeks to do so by augmenting regret with
information gain. However, estimating information gain is computationally
intractable or relies on restrictive assumptions which prohibit its use in many
practical instances. In this work, we posit an alternative exploration
incentive in terms of the integral probability metric (IPM) between a current
estimate of the transition model and the unknown optimal, which under suitable
conditions, can be computed in closed form with the kernelized Stein
discrepancy (KSD). Based on KSD, we develop a novel algorithm \algo:
\textbf{STE}in information dir\textbf{E}cted exploration for model-based
\textbf{R}einforcement Learn\textbf{ING}. To enable its derivation, we develop
fundamentally new variants of KSD for discrete conditional distributions. {We
further establish that {\algo} archives sublinear Bayesian regret, improving
upon prior learning rates of information-augmented MBRL.} Experimentally, we
show that the proposed algorithm is computationally affordable and outperforms
several prior approaches.
Related papers
- Informed Spectral Normalized Gaussian Processes for Trajectory Prediction [0.0]
We propose a novel regularization-based continual learning method for SNGPs.
Our proposal builds upon well-established methods and requires no rehearsal memory or parameter expansion.
We apply our informed SNGP model to the trajectory prediction problem in autonomous driving by integrating prior drivability knowledge.
arXiv Detail & Related papers (2024-03-18T17:05:24Z) - REMEDI: Corrective Transformations for Improved Neural Entropy Estimation [0.7488108981865708]
We introduce $textttREMEDI$ for efficient and accurate estimation of differential entropy.
Our approach demonstrates improvement across a broad spectrum of estimation tasks.
It can be naturally extended to information theoretic supervised learning models.
arXiv Detail & Related papers (2024-02-08T14:47:37Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Exploiting Temporal Structures of Cyclostationary Signals for
Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS)
We focus on cyclostationary signals, which are particularly suitable in a variety of application domains.
We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Regret Bounds for Information-Directed Reinforcement Learning [40.783225558237746]
Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL)
We develop novel information-theoretic tools to bound the information ratio and cumulative information gain about the learning target.
arXiv Detail & Related papers (2022-06-09T17:36:17Z) - Incorporating Causal Graphical Prior Knowledge into Predictive Modeling
via Simple Data Augmentation [92.96204497841032]
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
We propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the conditional independence (CI) relations.
We experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.
arXiv Detail & Related papers (2021-02-27T06:13:59Z) - Scalable Approximate Inference and Some Applications [2.6541211006790983]
In this thesis, we propose a new framework for approximate inference.
Our proposed four algorithms are motivated by the recent computational progress of Stein's method.
Results on simulated and real datasets indicate the statistical efficiency and wide applicability of our algorithm.
arXiv Detail & Related papers (2020-03-07T04:33:27Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z) - Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching.
A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.