AVG-DICE: Stationary Distribution Correction by Regression
- URL: http://arxiv.org/abs/2503.02125v1
- Date: Mon, 03 Mar 2025 23:14:02 GMT
- Title: AVG-DICE: Stationary Distribution Correction by Regression
- Authors: Fengdi Che, Bryan Chan, Chen Ma, A. Rupam Mahmood,
- Abstract summary: Off-policy policy evaluation (OPE) has long suffered from stationary state distribution mismatch.<n>We introduce AVG-DICE, a computationally simple Monte Carlo estimator for the density ratio.<n>In our experiments, AVG-DICE is at least as accurate as state-of-the-art estimators and sometimes offers orders-of-magnitude improvements.
- Score: 7.193870502672509
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Off-policy policy evaluation (OPE), an essential component of reinforcement learning, has long suffered from stationary state distribution mismatch, undermining both stability and accuracy of OPE estimates. While existing methods correct distribution shifts by estimating density ratios, they often rely on expensive optimization or backward Bellman-based updates and struggle to outperform simpler baselines. We introduce AVG-DICE, a computationally simple Monte Carlo estimator for the density ratio that averages discounted importance sampling ratios, providing an unbiased and consistent correction. AVG-DICE extends naturally to nonlinear function approximation using regression, which we roughly tune and test on OPE tasks based on Mujoco Gym environments and compare with state-of-the-art density-ratio estimators using their reported hyperparameters. In our experiments, AVG-DICE is at least as accurate as state-of-the-art estimators and sometimes offers orders-of-magnitude improvements. However, a sensitivity analysis shows that best-performing hyperparameters may vary substantially across different discount factors, so a re-tuning is suggested.
Related papers
- Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation.
We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z) - Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.<n>We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Nearest Neighbor Sampling for Covariate Shift Adaptation [7.940293148084844]
We propose a new covariate shift adaptation method which avoids estimating the weights.
The basic idea is to directly work on unlabeled target data, labeled according to the $k$-nearest neighbors in the source dataset.
Our experiments show that it achieves drastic reduction in the running time with remarkable accuracy.
arXiv Detail & Related papers (2023-12-15T17:28:09Z) - Optimal Training of Mean Variance Estimation Neural Networks [1.4610038284393165]
This paper focusses on the optimal implementation of a Mean Variance Estimation network (MVE network) (Nix and Weigend, 1994)
An MVE network assumes that the data is produced from a normal distribution with a mean function and variance function.
We introduce a novel improvement of the MVE network: separate regularization of the mean and the variance estimate.
arXiv Detail & Related papers (2023-02-17T13:44:47Z) - Proximal Policy Optimization with Adaptive Threshold for Symmetric
Relative Density Ratio [8.071506311915396]
A popular method, so-called policy optimization (PPO), and its variants constrain density ratio of the latest and baseline policies when the density ratio exceeds a given threshold.
This paper proposes a new PPO derived using relative Pearson (RPE) divergence, therefore so-called PPO-RPE, to design the threshold adaptively.
arXiv Detail & Related papers (2022-03-18T09:13:13Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Learning Calibrated Uncertainties for Domain Shift: A Distributionally
Robust Learning Approach [150.8920602230832]
We propose a framework for learning calibrated uncertainties under domain shifts.
In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution.
We show that our proposed method generates calibrated uncertainties that benefit downstream tasks.
arXiv Detail & Related papers (2020-10-08T02:10:54Z) - Variational Variance: Simple, Reliable, Calibrated Heteroscedastic Noise
Variance Parameterization [3.553493344868413]
We propose critiques to test predictive mean and variance calibration and the predictive distribution's ability to generate sensible data.
We find that our solution, to treat heteroscedastic variance variationally, sufficiently regularizes variance to pass these PPCs.
arXiv Detail & Related papers (2020-06-08T19:58:35Z) - Comment: Entropy Learning for Dynamic Treatment Regimes [58.442274475425144]
JSLZ's approach leverages a rejection-and-sampling estimate of the value of a given decision rule based on inverse probability (IPW) and its interpretation as a weighted (or cost-sensitive) classification.
Their use of smooth classification surrogates enables their careful approach to analyzing distributions.
The IPW estimate is problematic as it leads to weights that discard most of the data and are extremely variable on whatever remains.
arXiv Detail & Related papers (2020-04-06T16:11:05Z) - Pareto Smoothed Importance Sampling [8.705872384531318]
Importance weighting is a general way to adjust Monte Carlo integration to account for draws from the wrong distribution.
This routinely occurs when there are aspects of the target distribution that are not well captured by the approximating distribution.
We present a new method for stabilizing importance weights using a generalized Pareto distribution fit to the upper tail of the distribution of the simulated importance ratios.
arXiv Detail & Related papers (2015-07-09T18:43:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.