Between Rate-Distortion Theory & Value Equivalence in Model-Based
Reinforcement Learning
- URL: http://arxiv.org/abs/2206.02025v1
- Date: Sat, 4 Jun 2022 17:09:46 GMT
- Title: Between Rate-Distortion Theory & Value Equivalence in Model-Based
Reinforcement Learning
- Authors: Dilip Arumugam and Benjamin Van Roy
- Abstract summary: We introduce an algorithm for synthesizing simple and useful approximations of the environment from which an agent might still recover near-optimal behavior.
We recognize the information-theoretic nature of this lossy environment compression problem and use the appropriate tools of rate-distortion theory to make mathematically precise how value equivalence can lend tractability to otherwise intractable sequential decision-making problems.
- Score: 21.931580762349096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The quintessential model-based reinforcement-learning agent iteratively
refines its estimates or prior beliefs about the true underlying model of the
environment. Recent empirical successes in model-based reinforcement learning
with function approximation, however, eschew the true model in favor of a
surrogate that, while ignoring various facets of the environment, still
facilitates effective planning over behaviors. Recently formalized as the value
equivalence principle, this algorithmic technique is perhaps unavoidable as
real-world reinforcement learning demands consideration of a simple,
computationally-bounded agent interacting with an overwhelmingly complex
environment. In this work, we entertain an extreme scenario wherein some
combination of immense environment complexity and limited agent capacity
entirely precludes identifying an exactly value-equivalent model. In light of
this, we embrace a notion of approximate value equivalence and introduce an
algorithm for incrementally synthesizing simple and useful approximations of
the environment from which an agent might still recover near-optimal behavior.
Crucially, we recognize the information-theoretic nature of this lossy
environment compression problem and use the appropriate tools of
rate-distortion theory to make mathematically precise how value equivalence can
lend tractability to otherwise intractable sequential decision-making problems.
Related papers
- Towards Characterizing Domain Counterfactuals For Invertible Latent Causal Models [15.817239008727789]
In this work, we analyze a specific type of causal query called domain counterfactuals, which hypothesizes what a sample would have looked like if it had been generated in a different domain.
We show that recovering the latent Structural Causal Model (SCM) is unnecessary for estimating domain counterfactuals.
We also develop a theoretically grounded practical algorithm that simplifies the modeling process to generative model estimation.
arXiv Detail & Related papers (2023-06-20T04:19:06Z) - Annealing Optimization for Progressive Learning with Stochastic
Approximation [0.0]
We introduce a learning model designed to meet the needs of applications in which computational resources are limited.
We develop an online prototype-based learning algorithm that is formulated as an online-free gradient approximation algorithm.
The learning model can be viewed as an interpretable and progressively growing competitive neural network model to be used for supervised, unsupervised, and reinforcement learning.
arXiv Detail & Related papers (2022-09-06T21:31:01Z) - Deciding What to Model: Value-Equivalent Sampling for Reinforcement
Learning [21.931580762349096]
We introduce an algorithm that computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model.
We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem.
arXiv Detail & Related papers (2022-06-04T23:36:38Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Pessimistic Q-Learning for Offline Reinforcement Learning: Towards
Optimal Sample Complexity [51.476337785345436]
We study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes.
A variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity.
arXiv Detail & Related papers (2022-02-28T15:39:36Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - A bandit-learning approach to multifidelity approximation [7.960229223744695]
Multifidelity approximation is an important technique in scientific computation and simulation.
We introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates.
arXiv Detail & Related papers (2021-03-29T05:29:35Z) - Leveraging Unlabeled Data for Entity-Relation Extraction through
Probabilistic Constraint Satisfaction [54.06292969184476]
We study the problem of entity-relation extraction in the presence of symbolic domain knowledge.
Our approach employs semantic loss which captures the precise meaning of a logical sentence.
With a focus on low-data regimes, we show that semantic loss outperforms the baselines by a wide margin.
arXiv Detail & Related papers (2021-03-20T00:16:29Z) - The Value Equivalence Principle for Model-Based Reinforcement Learning [29.368870568214007]
We argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning.
We show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks.
We argue that the principle of value equivalence underlies a number of recent empirical successes in RL.
arXiv Detail & Related papers (2020-11-06T18:25:54Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.