Related papers: STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning

URL: http://arxiv.org/abs/2301.12038v2
Date: Tue, 19 Sep 2023 03:21:17 GMT
Title: STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning
Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Mengdi Wang, Furong Huang, Dinesh Manocha
Abstract summary: We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal. Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
Score: 111.75423966239092
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instances. In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD). Based on KSD, we develop a novel algorithm \algo: \textbf{STE}in information dir\textbf{E}cted exploration for model-based \textbf{R}einforcement Learn\textbf{ING}. To enable its derivation, we develop fundamentally new variants of KSD for discrete conditional distributions. {We further establish that {\algo} archives sublinear Bayesian regret, improving upon prior learning rates of information-augmented MBRL.} Experimentally, we show that the proposed algorithm is computationally affordable and outperforms several prior approaches.

Related papers

INFO-SEDD: Continuous Time Markov Chains as Scalable Information Metrics Estimators [7.399561232927219]
INFO-SEDD is a novel method for estimating information-theoretic quantities of discrete data, including mutual information and entropy. Our approach requires the training of a single parametric model, offering significant computational and memory advantages. Our experiments demonstrate that INFO-SEDD is robust and outperforms neural competitors that rely on embedding techniques.
arXiv Detail & Related papers (2025-02-26T14:40:00Z)
Informed Spectral Normalized Gaussian Processes for Trajectory Prediction [0.0]
We propose a novel regularization-based continual learning method for SNGPs. Our proposal builds upon well-established methods and requires no rehearsal memory or parameter expansion. We apply our informed SNGP model to the trajectory prediction problem in autonomous driving by integrating prior drivability knowledge.
arXiv Detail & Related papers (2024-03-18T17:05:24Z)
REMEDI: Corrective Transformations for Improved Neural Entropy Estimation [0.7488108981865708]
We introduce $textttREMEDI$ for efficient and accurate estimation of differential entropy. Our approach demonstrates improvement across a broad spectrum of estimation tasks. It can be naturally extended to information theoretic supervised learning models.
arXiv Detail & Related papers (2024-02-08T14:47:37Z)
Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS) We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z)
Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation [98.95383921866096]
We study the problem of single-channel source separation (SCSS) We focus on cyclostationary signals, which are particularly suitable in a variety of application domains. We propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator.
arXiv Detail & Related papers (2022-08-22T14:04:56Z)
On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations. We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z)
Regret Bounds for Information-Directed Reinforcement Learning [40.783225558237746]
Information-directed sampling (IDS) has revealed its potential as a data-efficient algorithm for reinforcement learning (RL) We develop novel information-theoretic tools to bound the information ratio and cumulative information gain about the learning target.
arXiv Detail & Related papers (2022-06-09T17:36:17Z)
Incorporating Causal Graphical Prior Knowledge into Predictive Modeling via Simple Data Augmentation [92.96204497841032]
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions. We propose a model-agnostic data augmentation method that allows us to exploit the prior knowledge of the conditional independence (CI) relations. We experimentally show that the proposed method is effective in improving the prediction accuracy, especially in the small-data regime.
arXiv Detail & Related papers (2021-02-27T06:13:59Z)
Scalable Approximate Inference and Some Applications [2.6541211006790983]
In this thesis, we propose a new framework for approximate inference. Our proposed four algorithms are motivated by the recent computational progress of Stein's method. Results on simulated and real datasets indicate the statistical efficiency and wide applicability of our algorithm.
arXiv Detail & Related papers (2020-03-07T04:33:27Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
Nested-Wasserstein Self-Imitation Learning for Sequence Generation [158.19606942252284]
We propose the concept of nested-Wasserstein distance for distributional semantic matching. A novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences.
arXiv Detail & Related papers (2020-01-20T02:19:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.