Policy Evaluation Networks
- URL: http://arxiv.org/abs/2002.11833v1
- Date: Wed, 26 Feb 2020 23:00:27 GMT
- Title: Policy Evaluation Networks
- Authors: Jean Harb, Tom Schaul, Doina Precup and Pierre-Luc Bacon
- Abstract summary: We introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding.
Our empirical results demonstrate that combining these three elements can produce policies that outperform those that generated the training data.
- Score: 50.53250641051648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many reinforcement learning algorithms use value functions to guide the
search for better policies. These methods estimate the value of a single policy
while generalizing across many states. The core idea of this paper is to flip
this convention and estimate the value of many policies, for a single set of
states. This approach opens up the possibility of performing direct gradient
ascent in policy space without seeing any new data. The main challenge for this
approach is finding a way to represent complex policies that facilitates
learning and generalization. To address this problem, we introduce a scalable,
differentiable fingerprinting mechanism that retains essential policy
information in a concise embedding. Our empirical results demonstrate that
combining these three elements (learned Policy Evaluation Network, policy
fingerprints, gradient ascent) can produce policies that outperform those that
generated the training data, in zero-shot manner.
Related papers
- Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems.
Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process.
This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Value Enhancement of Reinforcement Learning via Efficient and Robust
Trust Region Optimization [14.028916306297928]
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy.
We propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-01-05T18:43:40Z) - Towards A Unified Policy Abstraction Theory and Representation Learning
Approach in Markov Decision Processes [39.94472154078338]
We propose a unified policy abstraction theory, containing three types of policy abstraction associated to policy features at different levels.
We then generalize them to three policy metrics that quantify the distance (i.e., similarity) of policies.
For the empirical study, we investigate the efficacy of the proposed policy metrics and representations, in characterizing policy difference and conveying policy generalization respectively.
arXiv Detail & Related papers (2022-09-16T03:41:50Z) - General Policy Evaluation and Improvement by Learning to Identify Few
But Crucial States [12.059140532198064]
Learning to evaluate and improve policies is a core problem of Reinforcement Learning.
A recently explored competitive alternative is to learn a single value function for many policies.
We show that our value function trained to evaluate NN policies is also invariant to changes of the policy architecture.
arXiv Detail & Related papers (2022-07-04T16:34:53Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy.
We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance.
Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.