Model-Value Inconsistency as a Signal for Epistemic Uncertainty
- URL: http://arxiv.org/abs/2112.04153v1
- Date: Wed, 8 Dec 2021 07:53:41 GMT
- Title: Model-Value Inconsistency as a Signal for Epistemic Uncertainty
- Authors: Angelos Filos, Eszter V\'ertes, Zita Marinho, Gregory Farquhar, Diana
Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, Andr\'e Barreto, Simon
Osindero
- Abstract summary: Self-inconsistency is a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a model.
We show that, unlike prior work, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.
- Score: 22.492926703232015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Using a model of the environment and a value function, an agent can construct
many estimates of a state's value, by unrolling the model for different lengths
and bootstrapping with its value function. Our key insight is that one can
treat this set of value estimates as a type of ensemble, which we call an
\emph{implicit value ensemble} (IVE). Consequently, the discrepancy between
these estimates can be used as a proxy for the agent's epistemic uncertainty;
we term this signal \emph{model-value inconsistency} or
\emph{self-inconsistency} for short. Unlike prior work which estimates
uncertainty by training an ensemble of many models and/or value functions, this
approach requires only the single model and value function which are already
being learned in most model-based reinforcement learning algorithms. We provide
empirical evidence in both tabular and function approximation settings from
pixels that self-inconsistency is useful (i) as a signal for exploration, (ii)
for acting safely under distribution shifts, and (iii) for robustifying
value-based planning with a model.
Related papers
- Model-free Methods for Event History Analysis and Efficient Adjustment (PhD Thesis) [55.2480439325792]
This thesis is a series of independent contributions to statistics unified by a model-free perspective.
The first chapter elaborates on how a model-free perspective can be used to formulate flexible methods that leverage prediction techniques from machine learning.
The second chapter studies the concept of local independence, which describes whether the evolution of one process is directly influenced by another.
arXiv Detail & Related papers (2025-02-11T19:24:09Z) - Latent Semantic Consensus For Deterministic Geometric Model Fitting [109.44565542031384]
We propose an effective method called Latent Semantic Consensus (LSC)
LSC formulates the model fitting problem into two latent semantic spaces based on data points and model hypotheses.
LSC is able to provide consistent and reliable solutions within only a few milliseconds for general multi-structural model fitting.
arXiv Detail & Related papers (2024-03-11T05:35:38Z) - Scope Compliance Uncertainty Estimate [0.4262974002462632]
SafeML is a model-agnostic approach for performing such monitoring.
This work addresses these limitations by changing the binary decision to a continuous metric.
arXiv Detail & Related papers (2023-12-17T19:44:20Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Spectral Representation Learning for Conditional Moment Models [33.34244475589745]
We propose a procedure that automatically learns representations with controlled measures of ill-posedness.
Our method approximates a linear representation defined by the spectral decomposition of a conditional expectation operator.
We show this representation can be efficiently estimated from data, and establish L2 consistency for the resulting estimator.
arXiv Detail & Related papers (2022-10-29T07:48:29Z) - Deciding What to Model: Value-Equivalent Sampling for Reinforcement
Learning [21.931580762349096]
We introduce an algorithm that computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model.
We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem.
arXiv Detail & Related papers (2022-06-04T23:36:38Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - A bandit-learning approach to multifidelity approximation [7.960229223744695]
Multifidelity approximation is an important technique in scientific computation and simulation.
We introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates.
arXiv Detail & Related papers (2021-03-29T05:29:35Z) - Trust but Verify: Assigning Prediction Credibility by Counterfactual
Constrained Learning [123.3472310767721]
Prediction credibility measures are fundamental in statistics and machine learning.
These measures should account for the wide variety of models used in practice.
The framework developed in this work expresses the credibility as a risk-fit trade-off.
arXiv Detail & Related papers (2020-11-24T19:52:38Z) - The Value Equivalence Principle for Model-Based Reinforcement Learning [29.368870568214007]
We argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning.
We show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks.
We argue that the principle of value equivalence underlies a number of recent empirical successes in RL.
arXiv Detail & Related papers (2020-11-06T18:25:54Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.