Stateful Offline Contextual Policy Evaluation and Learning
- URL: http://arxiv.org/abs/2110.10081v1
- Date: Tue, 19 Oct 2021 16:15:56 GMT
- Title: Stateful Offline Contextual Policy Evaluation and Learning
- Authors: Nathan Kallus, Angela Zhou
- Abstract summary: We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
- Score: 88.9134799076718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study off-policy evaluation and learning from sequential data in a
structured class of Markov decision processes that arise from repeated
interactions with an exogenous sequence of arrivals with contexts, which
generate unknown individual-level responses to agent actions. This model can be
thought of as an offline generalization of contextual bandits with resource
constraints. We formalize the relevant causal structure of problems such as
dynamic personalized pricing and other operations management problems in the
presence of potentially high-dimensional user types. The key insight is that an
individual-level response is often not causally affected by the state variable
and can therefore easily be generalized across timesteps and states. When this
is true, we study implications for (doubly robust) off-policy evaluation and
learning by instead leveraging single time-step evaluation, estimating the
expectation over a single arrival via data from a population, for fitted-value
iteration in a marginal MDP. We study sample complexity and analyze error
amplification that leads to the persistence, rather than attenuation, of
confounding error over time. In simulations of dynamic and capacitated pricing,
we show improved out-of-sample policy performance in this class of relevant
problems.
Related papers
- On the Identification of Temporally Causal Representation with Instantaneous Dependence [50.14432597910128]
Temporally causal representation learning aims to identify the latent causal process from time series observations.
Most methods require the assumption that the latent causal processes do not have instantaneous relations.
We propose an textbfIDentification framework for instantanetextbfOus textbfLatent dynamics.
arXiv Detail & Related papers (2024-05-24T08:08:05Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Multi-level Adaptive Contrastive Learning for Knowledge Internalization
in Dialogue Generation [37.55417272177113]
Knowledge-grounded dialogue generation aims to incorporate external knowledge to supplement the context.
However, the model often fails to internalize this information into responses in a human-like manner.
We propose a Multi-level Adaptive Contrastive Learning framework that dynamically samples negative examples and subsequently penalizes degeneration behaviors.
arXiv Detail & Related papers (2023-10-13T08:16:27Z) - Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous
Unobserved Confounders [16.193776814471768]
We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders.
We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
arXiv Detail & Related papers (2023-02-01T18:40:53Z) - Model-Free and Model-Based Policy Evaluation when Causality is Uncertain [7.858296711223292]
In off-policy evaluation, there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy.
We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons.
We show that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics.
arXiv Detail & Related papers (2022-04-02T23:40:15Z) - Learning from Heterogeneous Data Based on Social Interactions over
Graphs [58.34060409467834]
This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
arXiv Detail & Related papers (2021-12-17T12:47:18Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.