Related papers: Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

URL: http://arxiv.org/abs/2312.12731v1
Date: Wed, 20 Dec 2023 03:03:06 GMT
Title: Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach
Authors: Wen Huang and Xintao Wu
Abstract summary: This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution. We categorize the biases into confounding bias and selection bias based on the causal structure they imply. We extract the causal bound for each arm that is robust towards compound biases from biased observational data.
Score: 18.13887411913371
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution. A major obstacle in this setting is the existence of compound biases from the observational data. Ignoring these biases and blindly fitting a model with the biased data could even negatively affect the online learning phase. In this work, we formulate this problem from a causal perspective. First, we categorize the biases into confounding bias and selection bias based on the causal structure they imply. Next, we extract the causal bound for each arm that is robust towards compound biases from biased observational data. The derived bounds contain the ground truth mean reward and can effectively guide the bandit agent to learn a nearly-optimal decision policy. We also conduct regret analysis in both contextual and non-contextual bandit settings and show that prior causal bounds could help consistently reduce the asymptotic regret.

Related papers

Graph-Based Prediction Models for Data Debiasing [6.221408085892461]
Bias in data collection, arising from both under-reporting and over-reporting, poses significant challenges in healthcare and public safety. We introduce Graph-based Over- and Under-reporting Debiasing (GROUD), a novel graph-based optimization framework that debiases reported data by jointly estimating the true incident counts and the associated reporting bias probabilities. We validate GROUD on both challenging simulated experiments and real-world datasets, including Atlanta emergency calls and COVID-19 vaccine adverse event reports.
arXiv Detail & Related papers (2025-04-12T21:34:49Z)
Looking at Model Debiasing through the Lens of Anomaly Detection [11.113718994341733]
Deep neural networks are sensitive to bias in the data. We propose a new bias identification method based on anomaly detection. We reach state-of-the-art performance on synthetic and real benchmark datasets.
arXiv Detail & Related papers (2024-07-24T17:30:21Z)
Revisiting the Dataset Bias Problem from a Statistical Perspective [72.94990819287551]
We study the "dataset bias" problem from a statistical standpoint. We identify the main cause of the problem as the strong correlation between a class attribute u and a non-class attribute b. We propose to mitigate dataset bias via either weighting the objective of each sample n by frac1p(u_n|b_n) or sampling that sample with a weight proportional to frac1p(u_n|b_n).
arXiv Detail & Related papers (2024-02-05T22:58:06Z)
Approximating Counterfactual Bounds while Fusing Observational, Biased and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies. We show that the likelihood of the available data has no local maxima. We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z)
A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data. We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z)
Unbiased Supervised Contrastive Learning [10.728852691100338]
In this work, we tackle the problem of learning representations that are robust to biases. We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses can fail when dealing with biased data. We derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples. Thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data.
arXiv Detail & Related papers (2022-11-10T13:44:57Z)
Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability. We propose a self-supervised debiasing framework potentially compatible with unlabeled samples. Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z)
D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases. A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z)
Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR) CPR achieves unbiased recommendation without knowing the exposure mechanism. We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z)
Correcting Exposure Bias for Link Recommendation [31.799185352323807]
Exposure bias can arise when users are systematically underexposed to certain relevant items. We propose estimators that leverage known exposure probabilities to mitigate this bias. Our methods lead to greater diversity in the recommended papers' fields of study.
arXiv Detail & Related papers (2021-06-13T16:51:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.