Comment: Entropy Learning for Dynamic Treatment Regimes
- URL: http://arxiv.org/abs/2004.02778v1
- Date: Mon, 6 Apr 2020 16:11:05 GMT
- Title: Comment: Entropy Learning for Dynamic Treatment Regimes
- Authors: Nathan Kallus
- Abstract summary: JSLZ's approach leverages a rejection-and-sampling estimate of the value of a given decision rule based on inverse probability (IPW) and its interpretation as a weighted (or cost-sensitive) classification.
Their use of smooth classification surrogates enables their careful approach to analyzing distributions.
The IPW estimate is problematic as it leads to weights that discard most of the data and are extremely variable on whatever remains.
- Score: 58.442274475425144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: I congratulate Profs. Binyan Jiang, Rui Song, Jialiang Li, and Donglin Zeng
(JSLZ) for an exciting development in conducting inferences on optimal dynamic
treatment regimes (DTRs) learned via empirical risk minimization using the
entropy loss as a surrogate. JSLZ's approach leverages a
rejection-and-importance-sampling estimate of the value of a given decision
rule based on inverse probability weighting (IPW) and its interpretation as a
weighted (or cost-sensitive) classification. Their use of smooth classification
surrogates enables their careful approach to analyzing asymptotic
distributions. However, even for evaluation purposes, the IPW estimate is
problematic as it leads to weights that discard most of the data and are
extremely variable on whatever remains. In this comment, I discuss an
optimization-based alternative to evaluating DTRs, review several connections,
and suggest directions forward. This extends the balanced policy evaluation
approach of Kallus (2018a) to the longitudinal setting.
Related papers
- Uncertainty-Penalized Direct Preference Optimization [52.387088396044206]
We develop a pessimistic framework for DPO by introducing preference uncertainty penalization schemes.
The penalization serves as a correction to the loss which attenuates the loss gradient for uncertain samples.
We show improved overall performance compared to vanilla DPO, as well as better completions on prompts from high-uncertainty chosen/rejected responses.
arXiv Detail & Related papers (2024-10-26T14:24:37Z) - Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - On Training Implicit Meta-Learning With Applications to Inductive
Weighing in Consistency Regularization [0.0]
Implicit meta-learning (IML) requires computing $2nd$ order gradients, particularly the Hessian.
Various approximations for the Hessian were proposed but a systematic comparison of their compute cost, stability, generalization of solution found and estimation accuracy were largely overlooked.
We show how training a "Confidence Network" to extract domain specific features can learn to up-weigh useful images and down-weigh out-of-distribution samples.
arXiv Detail & Related papers (2023-10-28T15:50:03Z) - A Semiparametric Instrumented Difference-in-Differences Approach to
Policy Learning [2.1989182578668243]
We propose a general instrumented difference-in-differences (DiD) approach for learning the optimal treatment policy.
Specifically, we establish identification results using a binary instrumental variable (IV) when the parallel trends assumption fails to hold.
We also construct a Wald estimator, novel inverse probability estimators, and a class of semi efficient and multiply robust estimators.
arXiv Detail & Related papers (2023-10-14T09:38:32Z) - Post Reinforcement Learning Inference [22.117487428829488]
We consider estimation and inference using data collected from reinforcement learning algorithms.
We propose a weighted Z-estimation approach with carefully designed adaptive weights to stabilize the time-varying variance.
Primary applications include dynamic treatment effect estimation and dynamic off-policy evaluation.
arXiv Detail & Related papers (2023-02-17T12:53:15Z) - Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement
Learning [0.0]
We revisit the estimation bias in policy gradients for the discounted episodic Markov decision process (MDP) from Deep Reinforcement Learning perspective.
One of the major policy biases is the state distribution shift.
We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways.
arXiv Detail & Related papers (2023-01-20T06:46:43Z) - An Investigation of the Bias-Variance Tradeoff in Meta-Gradients [53.28925387487846]
Hessian estimation always adds bias and can also add variance to meta-gradient estimation.
We study the bias and variance tradeoff arising from truncated backpropagation and sampling correction.
arXiv Detail & Related papers (2022-09-22T20:33:05Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - On Signal-to-Noise Ratio Issues in Variational Inference for Deep
Gaussian Processes [55.62520135103578]
We show that the gradient estimates used in training Deep Gaussian Processes (DGPs) with importance-weighted variational inference are susceptible to signal-to-noise ratio (SNR) issues.
We show that our fix can lead to consistent improvements in the predictive performance of DGP models.
arXiv Detail & Related papers (2020-11-01T14:38:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.