On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback
- URL: http://arxiv.org/abs/2503.05662v1
- Date: Fri, 07 Mar 2025 18:23:58 GMT
- Title: On Mitigating Affinity Bias through Bandits with Evolving Biased Feedback
- Authors: Matthew Faw, Constantine Caramanis, Jessica Hoffmann,
- Abstract summary: We focus on affinity bias, the component of unconscious bias which leads us to prefer people who are similar to us.<n>In a world where the people hired today become part of the hiring committee of tomorrow, we are particularly interested in understanding (and mitigating) how affinity bias affects this feedback loop.<n>We introduce a new bandits variant that exhibits those two features, which we call affinity bandits.
- Score: 21.345255802383882
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unconscious bias has been shown to influence how we assess our peers, with consequences for hiring, promotions and admissions. In this work, we focus on affinity bias, the component of unconscious bias which leads us to prefer people who are similar to us, despite no deliberate intention of favoritism. In a world where the people hired today become part of the hiring committee of tomorrow, we are particularly interested in understanding (and mitigating) how affinity bias affects this feedback loop. This problem has two distinctive features: 1) we only observe the biased value of a candidate, but we want to optimize with respect to their real value 2) the bias towards a candidate with a specific set of traits depends on the fraction of people in the hiring committee with the same set of traits. We introduce a new bandits variant that exhibits those two features, which we call affinity bandits. Unsurprisingly, classical algorithms such as UCB often fail to identify the best arm in this setting. We prove a new instance-dependent regret lower bound, which is larger than that in the standard bandit setting by a multiplicative function of $K$. Since we treat rewards that are time-varying and dependent on the policy's past actions, deriving this lower bound requires developing proof techniques beyond the standard bandit techniques. Finally, we design an elimination-style algorithm which nearly matches this regret, despite never observing the real rewards.
Related papers
- ALBAR: Adversarial Learning approach to mitigate Biases in Action Recognition [52.537021302246664]
Action recognition models often suffer from background bias (i.e., inferring actions based on background cues) and foreground bias (i.e., relying on subject appearance)<n>We propose ALBAR, a novel adversarial training method that mitigates foreground and background biases without requiring specialized knowledge of the bias attributes.<n>We evaluate our method on established background and foreground bias protocols, setting a new state-of-the-art and strongly improving combined debiasing performance by over 12% absolute on HMDB51.
arXiv Detail & Related papers (2025-01-31T20:47:06Z) - Multimodal Clickbait Detection by De-confounding Biases Using Causal Representation Inference [35.82337433829]
This paper focuses on detecting clickbait posts on the Web.
These posts often use eye-catching disinformation in mixed modalities to mislead users to click for profit.
Malicious creators use tricks to add some irrelevant non-bait content into bait posts, dressing them up as legal to fool the detector.
arXiv Detail & Related papers (2024-10-10T07:29:56Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)<n>We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.<n>We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - On the price of exact truthfulness in incentive-compatible online learning with bandit feedback: A regret lower bound for WSU-UX [8.861995276194435]
In one view of the classical game of prediction with expert advice with binary outcomes, each expert maintains an adversarially chosen belief.
We consider a recently introduced, strategic variant of this problem with selfish (reputation-seeking) experts.
We show, via explicit construction of loss sequences, that the algorithm suffers a worst-case $Omega(T2/3)$ lower bound.
arXiv Detail & Related papers (2024-04-08T02:41:32Z) - Bandit Social Learning: Exploration under Myopic Behavior [54.767961587919075]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Causal Intervention for Fairness in Multi-behavior Recommendation [40.938727601434195]
We argue that the relationships between different user behaviors (e.g., conversion rate) actually reflect the item quality.
To handle the unfairness issues, we propose to mitigate the popularity bias by considering multiple user behaviors.
arXiv Detail & Related papers (2022-09-10T04:21:25Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - The price of unfairness in linear bandits with biased feedback [62.25313751895011]
We study the problem of sequential decision making with biased linear bandit feedback.
We show that the worst case regret is higher than the dT 1/2 log(T) regret rate obtained under unbiased feedback.
Interestingly, the gap-dependent rates reveal the existence of non-trivial instances where the problem is no more difficult than its unbiased counterpart.
arXiv Detail & Related papers (2022-03-18T08:03:20Z) - Gradient Based Activations for Accurate Bias-Free Learning [22.264226961225003]
We show that a biased discriminator can actually be used to improve this bias-accuracy tradeoff.
Specifically, this is achieved by using a feature masking approach using the discriminator's gradients.
We show that this simple approach works well to reduce bias as well as improve accuracy significantly.
arXiv Detail & Related papers (2022-02-17T00:30:40Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.