Robustly Improving Bandit Algorithms with Confounded and Selection
Biased Offline Data: A Causal Approach
- URL: http://arxiv.org/abs/2312.12731v1
- Date: Wed, 20 Dec 2023 03:03:06 GMT
- Title: Robustly Improving Bandit Algorithms with Confounded and Selection
Biased Offline Data: A Causal Approach
- Authors: Wen Huang and Xintao Wu
- Abstract summary: This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution.
We categorize the biases into confounding bias and selection bias based on the causal structure they imply.
We extract the causal bound for each arm that is robust towards compound biases from biased observational data.
- Score: 18.13887411913371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies bandit problems where an agent has access to offline data
that might be utilized to potentially improve the estimation of each arm's
reward distribution. A major obstacle in this setting is the existence of
compound biases from the observational data. Ignoring these biases and blindly
fitting a model with the biased data could even negatively affect the online
learning phase. In this work, we formulate this problem from a causal
perspective. First, we categorize the biases into confounding bias and
selection bias based on the causal structure they imply. Next, we extract the
causal bound for each arm that is robust towards compound biases from biased
observational data. The derived bounds contain the ground truth mean reward and
can effectively guide the bandit agent to learn a nearly-optimal decision
policy. We also conduct regret analysis in both contextual and non-contextual
bandit settings and show that prior causal bounds could help consistently
reduce the asymptotic regret.
Related papers
- Looking at Model Debiasing through the Lens of Anomaly Detection [11.113718994341733]
Deep neural networks are sensitive to bias in the data.
We propose a new bias identification method based on anomaly detection.
We reach state-of-the-art performance on synthetic and real benchmark datasets.
arXiv Detail & Related papers (2024-07-24T17:30:21Z) - Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint.
We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b.
We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - A Unified Framework of Policy Learning for Contextual Bandit with
Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.
We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z) - Unbiased Supervised Contrastive Learning [10.728852691100338]
In this work, we tackle the problem of learning representations that are robust to biases.
We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses can fail when dealing with biased data.
We derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples.
Thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data.
arXiv Detail & Related papers (2022-11-10T13:44:57Z) - Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability.
We propose a self-supervised debiasing framework potentially compatible with unlabeled samples.
Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - Correcting Exposure Bias for Link Recommendation [31.799185352323807]
Exposure bias can arise when users are systematically underexposed to certain relevant items.
We propose estimators that leverage known exposure probabilities to mitigate this bias.
Our methods lead to greater diversity in the recommended papers' fields of study.
arXiv Detail & Related papers (2021-06-13T16:51:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.