Value-aware Importance Weighting for Off-policy Reinforcement Learning
- URL: http://arxiv.org/abs/2306.15625v1
- Date: Tue, 27 Jun 2023 17:05:22 GMT
- Title: Value-aware Importance Weighting for Off-policy Reinforcement Learning
- Authors: Kristopher De Asis, Eric Graves, Richard S. Sutton
- Abstract summary: Importance sampling is a central idea underlying off-policy prediction in reinforcement learning.
In this work, we consider a broader class of importance weights to correct samples in off-policy learning.
We derive how such weights can be computed, and detail key properties of the resulting importance weights.
- Score: 11.3798693158017
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Importance sampling is a central idea underlying off-policy prediction in
reinforcement learning. It provides a strategy for re-weighting samples from a
distribution to obtain unbiased estimates under another distribution. However,
importance sampling weights tend to exhibit extreme variance, often leading to
stability issues in practice. In this work, we consider a broader class of
importance weights to correct samples in off-policy learning. We propose the
use of $\textit{value-aware importance weights}$ which take into account the
sample space to provide lower variance, but still unbiased, estimates under a
target distribution. We derive how such weights can be computed, and detail key
properties of the resulting importance weights. We then extend several
reinforcement learning prediction algorithms to the off-policy setting with
these weights, and evaluate them empirically.
Related papers
- A Short Survey on Importance Weighting for Machine Learning [3.27651593877935]
It is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio.
This survey summarizes the broad applications of importance weighting in machine learning and related research.
arXiv Detail & Related papers (2024-03-15T10:31:46Z) - Adaptive Distribution Calibration for Few-Shot Learning with
Hierarchical Optimal Transport [78.9167477093745]
We propose a novel distribution calibration method by learning the adaptive weight matrix between novel samples and base classes.
Experimental results on standard benchmarks demonstrate that our proposed plug-and-play model outperforms competing approaches.
arXiv Detail & Related papers (2022-10-09T02:32:57Z) - Learning to Re-weight Examples with Optimal Transport for Imbalanced
Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models.
One of the most widely-used approaches for tackling imbalanced data is re-weighting.
We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z) - Rethinking Importance Weighting for Transfer Learning [71.81262398144946]
Key assumption in supervised learning is that training and test data follow the same probability distribution.
As real-world machine learning tasks are becoming increasingly complex, novel approaches are explored to cope with such challenges.
arXiv Detail & Related papers (2021-12-19T14:35:25Z) - Multicalibrated Partitions for Importance Weights [17.1726078570842]
importance weights play a fundamental role in many different fields, most notably, statistics and machine learning.
We show that the MaxEntropy approach may fail to assign high average scores to sets $C in mathcalC$, even when the average of ground truth weights for the set is evidently large.
We present an efficient algorithm that under standard learnability assumptions computes weights which satisfy these bounds.
arXiv Detail & Related papers (2021-03-10T03:32:36Z) - Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type.
Recent literature has explored representation learning to achieve this goal.
We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
arXiv Detail & Related papers (2020-10-23T19:06:03Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z) - Towards an Intrinsic Definition of Robustness for a Classifier [4.205692673448206]
We show that averaging the radius of robustness of samples in a validation set is a statistically weak measure.
We propose instead to weight the importance of samples depending on their difficulty.
We empirically demonstrate the ability of the proposed score to measure robustness of classifiers with little dependence on the choice of samples.
arXiv Detail & Related papers (2020-06-09T07:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.