Related papers: Stateful Offline Contextual Policy Evaluation and Learning

Stateful Offline Contextual Policy Evaluation and Learning

URL: http://arxiv.org/abs/2110.10081v1
Date: Tue, 19 Oct 2021 16:15:56 GMT
Title: Stateful Offline Contextual Policy Evaluation and Learning
Authors: Nathan Kallus, Angela Zhou
Abstract summary: We study off-policy evaluation and learning from sequential data. We formalize the relevant causal structure of problems such as dynamic personalized pricing. We show improved out-of-sample policy performance in this class of relevant problems.
Score: 88.9134799076718
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study off-policy evaluation and learning from sequential data in a structured class of Markov decision processes that arise from repeated interactions with an exogenous sequence of arrivals with contexts, which generate unknown individual-level responses to agent actions. This model can be thought of as an offline generalization of contextual bandits with resource constraints. We formalize the relevant causal structure of problems such as dynamic personalized pricing and other operations management problems in the presence of potentially high-dimensional user types. The key insight is that an individual-level response is often not causally affected by the state variable and can therefore easily be generalized across timesteps and states. When this is true, we study implications for (doubly robust) off-policy evaluation and learning by instead leveraging single time-step evaluation, estimating the expectation over a single arrival via data from a population, for fitted-value iteration in a marginal MDP. We study sample complexity and analyze error amplification that leads to the persistence, rather than attenuation, of confounding error over time. In simulations of dynamic and capacitated pricing, we show improved out-of-sample policy performance in this class of relevant problems.

Related papers

On the Identification of Temporally Causal Representation with Instantaneous Dependence [50.14432597910128]
Temporally causal representation learning aims to identify the latent causal process from time series observations. Most methods require the assumption that the latent causal processes do not have instantaneous relations. We propose an textbfIDentification framework for instantanetextbfOus textbfLatent dynamics.
arXiv Detail & Related papers (2024-05-24T08:08:05Z)
Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL) Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function. We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Multi-level Adaptive Contrastive Learning for Knowledge Internalization in Dialogue Generation [37.55417272177113]
Knowledge-grounded dialogue generation aims to incorporate external knowledge to supplement the context. However, the model often fails to internalize this information into responses in a human-like manner. We propose a Multi-level Adaptive Contrastive Learning framework that dynamically samples negative examples and subsequently penalizes degeneration behaviors.
arXiv Detail & Related papers (2023-10-13T08:16:27Z)
Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework. We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders [16.193776814471768]
We study robust policy evaluation and policy optimization in the presence of sequentially-exogenous unobserved confounders. We provide sample complexity bounds, insights, and show effectiveness both in simulations and on real-world longitudinal healthcare data of treating sepsis.
arXiv Detail & Related papers (2023-02-01T18:40:53Z)
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain [7.858296711223292]
In off-policy evaluation, there may exist unobserved variables that both impact the dynamics and are used by the unknown behavior policy. We develop worst-case bounds to assess sensitivity to these unobserved confounders in finite horizons. We show that a model-based approach with robust MDPs gives sharper lower bounds by exploiting domain knowledge about the dynamics.
arXiv Detail & Related papers (2022-04-02T23:40:15Z)
Learning from Heterogeneous Data Based on Social Interactions over Graphs [58.34060409467834]
This work proposes a decentralized architecture, where individual agents aim at solving a classification problem while observing streaming features of different dimensions. We show that the. strategy enables the agents to learn consistently under this highly-heterogeneous setting. We show that the. strategy enables the agents to learn consistently under this highly-heterogeneous setting.
arXiv Detail & Related papers (2021-12-17T12:47:18Z)
GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications. Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions. The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.