Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood
- URL: http://arxiv.org/abs/2602.10608v1
- Date: Wed, 11 Feb 2026 07:57:40 GMT
- Title: Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood
- Authors: Jiangrong Ouyang, Mingming Gong, Howard Bondell,
- Abstract summary: Policy inference plays an essential role in the contextual bandit problem.<n>We use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies.
- Score: 45.88028371034407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set.
Related papers
- PAC Off-Policy Prediction of Contextual Bandits [0.0]
This paper investigates off-policy evaluation in contextual bandits.<n>It aims to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy.
arXiv Detail & Related papers (2025-07-22T05:12:29Z) - Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems.
Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process.
This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z) - Predictive Performance Comparison of Decision Policies Under Confounding [32.21041697921289]
We propose a method to compare the predictive performance of decision policies under a variety of modern identification approaches.
Key to our method is the insight that there are regions of uncertainty that we can safely ignore in the policy comparison.
arXiv Detail & Related papers (2024-04-01T01:27:07Z) - Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set.
Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity.
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z) - A Convex Framework for Confounding Robust Inference [21.918894096307294]
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
We propose a general estimator that provides a sharp lower bound of the policy value using convex programming.
arXiv Detail & Related papers (2023-09-21T19:45:37Z) - Auditing Fairness by Betting [43.515287900510934]
We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models.<n>Our methods are sequential and allow for the continuous monitoring of incoming data.<n>We demonstrate the efficacy of our approach on three benchmark fairness datasets.
arXiv Detail & Related papers (2023-05-27T20:14:11Z) - Bounded Robustness in Reinforcement Learning via Lexicographic
Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost.
We study how policies can be maximally robust to arbitrary observational noise.
We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.