Related papers: Achieving Counterfactual Fairness for Causal Bandit

Achieving Counterfactual Fairness for Causal Bandit

URL: http://arxiv.org/abs/2109.10458v1
Date: Tue, 21 Sep 2021 23:44:48 GMT
Title: Achieving Counterfactual Fairness for Causal Bandit
Authors: Wen Huang, Lu Zhang, Xintao Wu
Abstract summary: We study how to recommend an item at each step to maximize the expected reward. We then propose the fair causal bandit (F-UCB) for achieving the counterfactual individual fairness.
Score: 18.077963117600785
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In online recommendation, customers arrive in a sequential and stochastic manner from an underlying distribution and the online decision model recommends a chosen item for each arriving individual based on some strategy. We study how to recommend an item at each step to maximize the expected reward while achieving user-side fairness for customers, i.e., customers who share similar profiles will receive a similar reward regardless of their sensitive attributes and items being recommended. By incorporating causal inference into bandits and adopting soft intervention to model the arm selection strategy, we first propose the d-separation based UCB algorithm (D-UCB) to explore the utilization of the d-separation set in reducing the amount of exploration needed to achieve low cumulative regret. Based on that, we then propose the fair causal bandit (F-UCB) for achieving the counterfactual individual fairness. Both theoretical analysis and empirical evaluation demonstrate effectiveness of our algorithms.

Related papers

Unbiased Collaborative Filtering with Fair Sampling [31.8123420283795]
We show that popularity bias arises from the influence of propensity factors during training. We propose a fair sampling (FS) method that ensures each user and each item has an equal likelihood of being selected as both positive and negative instances.
arXiv Detail & Related papers (2025-02-19T15:59:49Z)
Learning Recommender Systems with Soft Target: A Decoupled Perspective [49.83787742587449]
We propose a novel decoupled soft label optimization framework to consider the objectives as two aspects by leveraging soft labels. We present a sensible soft-label generation algorithm that models a label propagation algorithm to explore users' latent interests in unobserved feedback via neighbors.
arXiv Detail & Related papers (2024-10-09T04:20:15Z)
Meta Clustering of Neural Bandits [45.77505279698894]
We study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function. We propose a novel algorithm called M-CNB, which utilizes a meta-learner to represent and rapidly adapt to dynamic clusters. In extensive experiments conducted in both recommendation and online classification scenarios, M-CNB outperforms SOTA baselines.
arXiv Detail & Related papers (2024-08-10T16:09:51Z)
Robust Preference Optimization through Reward Model Distillation [68.65844394615702]
Language model (LM) post-training involves maximizing a reward function that is derived from preference annotations. DPO is a popular offline alignment method that trains a policy directly on preference data without the need to train a reward model or apply reinforcement learning. We analyze this phenomenon and propose distillation to get a better proxy for the true preference distribution over generation pairs.
arXiv Detail & Related papers (2024-05-29T17:39:48Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model. Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
Aligning Large Language Models by On-Policy Self-Judgment [49.31895979525054]
Existing approaches for aligning large language models with human preferences face a trade-off that requires a separate reward model (RM) for on-policy learning. We present a novel alignment framework, SELF-JUDGE, that does on-policy learning and is parameter efficient. We show that the rejecting sampling by itself can improve performance further without an additional evaluator.
arXiv Detail & Related papers (2024-02-17T11:25:26Z)
Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items. Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate. We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z)
The Unfairness of Active Users and Popularity Bias in Point-of-Interest Recommendation [4.578469978594752]
This paper studies the interplay between (i) the unfairness of active users, (ii) the unfairness of popular items, and (iii) the accuracy of recommendation as three angles of our study triangle. For item fairness, we divide items into short-head, mid-tail, and long-tail groups and study the exposure of these item groups into the top-k recommendation list of users. Our study shows that most recommendation models cannot satisfy both consumer and producer fairness, indicating a trade-off between these variables possibly due to natural biases in data.
arXiv Detail & Related papers (2022-02-27T08:02:19Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Continuous Mean-Covariance Bandits [39.820490484375156]
We propose a novel Continuous Mean-Covariance Bandit model to take into account option correlation. In CMCB, there is a learner who sequentially chooses weight vectors on given options and observes random feedback according to the decisions. We propose novel algorithms with optimal regrets (within logarithmic factors) and provide matching lower bounds to validate their optimalities.
arXiv Detail & Related papers (2021-02-24T06:37:05Z)
Causality-Aware Neighborhood Methods for Recommender Systems [3.0919302844782717]
Business objectives of recommenders, such as increasing sales, are aligned with the causal effect of recommendations. Previous recommenders employ the inverse propensity scoring (IPS) in causal inference. We develop robust ranking methods for the causal effect of recommendations.
arXiv Detail & Related papers (2020-12-17T08:23:17Z)
Achieving User-Side Fairness in Contextual Bandits [17.947543703195738]
We study how to achieve user-side fairness in personalized recommendation. We formulate our fair personalized recommendation as a modified contextual bandit. We develop a fair contextual bandit algorithm, Fair-LinUCB, that improves upon the traditional LinUCB algorithm.
arXiv Detail & Related papers (2020-10-22T22:58:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.