User-Assistant Bias in LLMs
- URL: http://arxiv.org/abs/2508.15815v1
- Date: Sat, 16 Aug 2025 20:33:09 GMT
- Title: User-Assistant Bias in LLMs
- Authors: Xu Pan, Jingxuan Fan, Zidi Xiong, Ely Hahami, Jorin Overwiening, Ziqian Xie,
- Abstract summary: Large language models (LLMs) can bias towards relying on their own or the user's information in chat history, leading to overly stubborn or agreeable behaviors in multi-turn conversations.<n>We introduce an 8k multi-turn conversation dataset $textbfUserAssist$ to benchmark, understand and manipulate the user-assistant bias in frontier LLMs.
- Score: 11.825607435336776
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) can bias towards relying on their own or the user's information in chat history, leading to overly stubborn or agreeable behaviors in multi-turn conversations. In this paper, we formalize this model characteristic as user-assistant bias and introduce an 8k multi-turn conversation dataset $\textbf{UserAssist}$, which we use to benchmark, understand and manipulate the user-assistant bias in frontier LLMs. Leveraging $\textbf{UserAssist-test}$, we first benchmark the user-assistant bias of 26 commercial and 26 open-weight models. Commercial models show various levels of user bias. Evaluation on open-weight models reveals significant user bias in the instruction-tuned models, and weak user bias in reasoning (or reasoning-distilled) models. We then perform controlled fine-tuning experiments to pinpoint the post-training recipe contributing to these bias shifts: human preference alignment increases user bias, while training on chain-of-thought reasoning traces decreases it. Finally, we demonstrate that user-assistant bias can be bidirectionally adjusted by performing direct preference optimization (DPO) on $\textbf{UserAssist-train}$, and generalizes well to both in-domain and out-of-domain conversations. Our results provide insights into how the LLM integrates information from different sources, and also a viable way to detect and control model abnormalities.
Related papers
- Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop [17.229330734667474]
We introduce the concept of textbfSelf-textbfConsuming textbfPerformative textbfLoop.<n>We investigate the role of synthetic data in shaping bias during dynamic iterative training processes under controlled performative feedback.
arXiv Detail & Related papers (2026-01-08T18:08:15Z) - Adaptive Generation of Bias-Eliciting Questions for LLMs [18.608477560948003]
Large language models (LLMs) are now widely deployed in user-facing applications, reaching hundreds of millions worldwide.<n>We introduce a counterfactual bias evaluation framework that automatically generates realistic, open-ended questions over sensitive attributes such as sex, race, or religion.<n>We also capture distinct response dimensions that are increasingly relevant in user interactions, such as asymmetric refusals and explicit acknowledgment of bias.
arXiv Detail & Related papers (2025-10-14T13:08:10Z) - Addressing Personalized Bias for Unbiased Learning to Rank [56.663619153713434]
Unbiased learning to rank (ULTR) aims to learn unbiased ranking models from biased user behavior logs.<n>We propose a novel user-aware inverse-propensity-score estimator for learning-to-rank objectives.
arXiv Detail & Related papers (2025-08-28T14:01:31Z) - Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z) - Towards Resource Efficient and Interpretable Bias Mitigation in Large Language Models [1.787433808079955]
Large language models (LLMs) have been observed to perpetuate unwanted biases in training data.<n>In this paper, we mitigate bias by leveraging small biased and anti-biased expert models to obtain a debiasing signal.<n> Experiments on mitigating gender, race, and religion biases show a reduction in bias on several local and global bias metrics.
arXiv Detail & Related papers (2024-12-02T16:56:08Z) - Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models [15.53216696218776]
We explore the internal mechanisms of how bias emerges in large language models (LLMs) when provided with ambiguous comparative prompts.
We propose $textttATLAS$, a technique to localize bias to specific layers of the LLM by analyzing attention scores and then reduce bias by scaling attention in these biased layers.
arXiv Detail & Related papers (2024-10-29T20:15:56Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
We propose BMBI, an approach to mitigate the bias of multiple-choice QA models.
Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance.
We show that our method could be applied to multiple QA formulations across multiple bias categories.
arXiv Detail & Related papers (2023-10-13T00:49:09Z) - Soft-prompt Tuning for Large Language Models to Evaluate Bias [0.03141085922386211]
Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection.
We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns.
arXiv Detail & Related papers (2023-06-07T19:11:25Z) - Linear Speedup in Personalized Collaborative Learning [69.45124829480106]
Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias.
We formalize the personalized collaborative learning problem as optimization of a user's objective.
We explore conditions under which we can optimally trade-off their bias for a reduction in variance.
arXiv Detail & Related papers (2021-11-10T22:12:52Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.