Residual Overfit Method of Exploration
- URL: http://arxiv.org/abs/2110.02919v1
- Date: Wed, 6 Oct 2021 17:05:33 GMT
- Title: Residual Overfit Method of Exploration
- Authors: James McInerney, Nathan Kallus
- Abstract summary: We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
- Score: 78.07532520582313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploration is a crucial aspect of bandit and reinforcement learning
algorithms. The uncertainty quantification necessary for exploration often
comes from either closed-form expressions based on simple models or resampling
and posterior approximations that are computationally intensive. We propose
instead an approximate exploration methodology based on fitting only two point
estimates, one tuned and one overfit. The approach, which we term the residual
overfit method of exploration (ROME), drives exploration towards actions where
the overfit model exhibits the most overfitting compared to the tuned model.
The intuition is that overfitting occurs the most at actions and contexts with
insufficient data to form accurate predictions of the reward. We justify this
intuition formally from both a frequentist and a Bayesian information theoretic
perspective. The result is a method that generalizes to a wide variety of
models and avoids the computational overhead of resampling or posterior
approximations. We compare ROME against a set of established contextual bandit
methods on three datasets and find it to be one of the best performing.
Related papers
- In-Context Parametric Inference: Point or Distribution Estimators? [66.22308335324239]
We show that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
Our experiments indicate that amortized point estimators generally outperform posterior inference, though the latter remain competitive in some low-dimensional problems.
arXiv Detail & Related papers (2025-02-17T10:00:24Z) - Predictive Coresets [0.0]
Conventional coreset approaches determine weights by minimizing the Kullback-Leibler divergence between the likelihood functions of the full and weighted datasets.
We propose an alternative variational method which employs randomized posteriors and finds weights to match the unknown posterior predictive distributions conditioned on the full and reduced datasets.
We evaluate the performance of the proposed coreset construction on diverse problems, including random partitions and density estimation.
arXiv Detail & Related papers (2025-02-08T23:57:43Z) - Likelihood approximations via Gaussian approximate inference [3.4991031406102238]
We propose efficient schemes to approximate the effects of non-Gaussian likelihoods by Gaussian densities.
Our results attain good approximation quality for binary and multiclass classification in large-scale point-estimate and distributional inferential settings.
As a by-product, we show that the proposed approximate log-likelihoods are a superior alternative to least-squares on raw labels for neural network classification.
arXiv Detail & Related papers (2024-10-28T05:39:26Z) - Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution.
Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z) - Towards Model-Agnostic Posterior Approximation for Fast and Accurate Variational Autoencoders [22.77397537980102]
We show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model's posterior.
We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines.
arXiv Detail & Related papers (2024-03-13T20:16:21Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Composed Image Retrieval with Text Feedback via Multi-grained
Uncertainty Regularization [73.04187954213471]
We introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval.
The proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline.
arXiv Detail & Related papers (2022-11-14T14:25:40Z) - Deep Learning Methods for Proximal Inference via Maximum Moment
Restriction [0.0]
We introduce a flexible and scalable method based on a deep neural network to estimate causal effects in the presence of unmeasured confounding.
Our method achieves state of the art performance on two well-established proximal inference benchmarks.
arXiv Detail & Related papers (2022-05-19T19:51:42Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.