Stateful Offline Contextual Policy Evaluation and Learning
        - URL: http://arxiv.org/abs/2110.10081v1
 - Date: Tue, 19 Oct 2021 16:15:56 GMT
 - Title: Stateful Offline Contextual Policy Evaluation and Learning
 - Authors: Nathan Kallus, Angela Zhou
 - Abstract summary: We study off-policy evaluation and learning from sequential data.
We formalize the relevant causal structure of problems such as dynamic personalized pricing.
We show improved out-of-sample policy performance in this class of relevant problems.
 - Score: 88.9134799076718
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   We study off-policy evaluation and learning from sequential data in a
structured class of Markov decision processes that arise from repeated
interactions with an exogenous sequence of arrivals with contexts, which
generate unknown individual-level responses to agent actions. This model can be
thought of as an offline generalization of contextual bandits with resource
constraints. We formalize the relevant causal structure of problems such as
dynamic personalized pricing and other operations management problems in the
presence of potentially high-dimensional user types. The key insight is that an
individual-level response is often not causally affected by the state variable
and can therefore easily be generalized across timesteps and states. When this
is true, we study implications for (doubly robust) off-policy evaluation and
learning by instead leveraging single time-step evaluation, estimating the
expectation over a single arrival via data from a population, for fitted-value
iteration in a marginal MDP. We study sample complexity and analyze error
amplification that leads to the persistence, rather than attenuation, of
confounding error over time. In simulations of dynamic and capacitated pricing,
we show improved out-of-sample policy performance in this class of relevant
problems.
 
       
      
        Related papers
        - On the Identification of Temporally Causal Representation with   Instantaneous Dependence [50.14432597910128]
Temporally causal representation learning aims to identify the latent causal process from time series observations.
Most methods require the assumption that the latent causal processes do not have instantaneous relations.
We propose an textbfIDentification framework for instantanetextbfOus textbfLatent dynamics.
arXiv  Detail & Related papers  (2024-05-24T08:08:05Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded   Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv  Detail & Related papers  (2024-03-18T14:51:19Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv  Detail & Related papers  (2023-11-02T16:45:25Z) - Multi-level Adaptive Contrastive Learning for Knowledge Internalization
  in Dialogue Generation [37.55417272177113]
Knowledge-grounded dialogue generation aims to incorporate external knowledge to supplement the context.
However, the model often fails to internalize this information into responses in a human-like manner.
We propose a Multi-level Adaptive Contrastive Learning framework that dynamically samples negative examples and subsequently penalizes degeneration behaviors.
arXiv  Detail & Related papers  (2023-10-13T08:16:27Z) - Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv  Detail & Related papers  (2023-08-24T05:26:42Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
  Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv  Detail & Related papers  (2023-06-23T17:59:09Z) - Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous
  Unobserved Confounders [16.193776814471768]
We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders.
We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
arXiv  Detail & Related papers  (2023-02-01T18:40:53Z) - Model-Free and Model-Based Policy Evaluation when Causality is Uncertain [7.858296711223292]
In off-policy evaluation, there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy.
We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons.
We show that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics.
arXiv  Detail & Related papers  (2022-04-02T23:40:15Z) - Learning from Heterogeneous Data Based on Social Interactions over
  Graphs [58.34060409467834]
This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
We show that the.
strategy enables the agents to learn consistently under this highly-heterogeneous setting.
arXiv  Detail & Related papers  (2021-12-17T12:47:18Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv  Detail & Related papers  (2020-02-21T00:27:52Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.