Related papers: BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

URL: http://arxiv.org/abs/2510.00307v1
Date: Tue, 30 Sep 2025 22:02:13 GMT
Title: BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models
Authors: Thierry Blankenstein, Jialin Yu, Zixuan Li, Vassilis Plachouras, Sunando Sengupta, Philip Torr, Yarin Gal, Alasdair Paren, Adel Bibi,
Abstract summary: Large language models (LLMs) often rely on external tools drawn from marketplaces where multiple providers offer functionally equivalent options.<n>This raises a critical point concerning fairness: if selection is systematically biased, it can degrade user experience and distort competition.<n>We introduce a benchmark of diverse tool categories, each containing multiple functionally equivalent tools, to evaluate tool-selection bias.
Score: 55.119657444627855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agents backed by large language models (LLMs) often rely on external tools drawn from marketplaces where multiple providers offer functionally equivalent options. This raises a critical point concerning fairness: if selection is systematically biased, it can degrade user experience and distort competition by privileging some providers over others. We introduce a benchmark of diverse tool categories, each containing multiple functionally equivalent tools, to evaluate tool-selection bias. Using this benchmark, we test seven models and show that unfairness exists with models either fixating on a single provider or disproportionately preferring earlier-listed tools in context. To investigate the origins of this bias, we conduct controlled experiments examining tool features, metadata (name, description, parameters), and pre-training exposure. We find that: (1) semantic alignment between queries and metadata is the strongest predictor of choice; (2) perturbing descriptions significantly shifts selections; and (3) repeated pre-training exposure to a single endpoint amplifies bias. Finally, we propose a lightweight mitigation that first filters the candidate tools to a relevant subset and then samples uniformly, reducing bias while preserving good task coverage. Our findings highlight tool-selection bias as a key obstacle for the fair deployment of tool-augmented LLMs.

Related papers

ToolTweak: An Attack on Tool Selection in LLM-based Agents [52.17181489286236]
We show that adversaries can systematically bias agents toward selecting specific tools, gaining unfair advantage over equally capable alternatives.<n>We present ToolTweak, a lightweight automatic attack that increases selection rates from a baseline of around 20% to as high as 81%.<n>To mitigate these risks, we evaluate two defenses: paraphrasing and perplexity filtering, which reduce bias and lead agents to select functionally similar tools more equally.
arXiv Detail & Related papers (2025-10-02T20:44:44Z)
Meta-Reasoning Improves Tool Use in Large Language Models [10.193264105560864]
We present Tool selECTion via meta-reasONing (TECTON), a two-phase system that first reasons over a task and outputs candidate tools.<n>TECTON results in substantial gains--both in-distribution and out-of-distribution--on a range of math reasoning datasets.
arXiv Detail & Related papers (2024-11-07T08:48:33Z)
Mitigating Selection Bias with Node Pruning and Auxiliary Options [11.835002896308545]
Large language models (LLMs) often exhibit systematic preferences for certain answer choices when responding to multiple-choice questions.<n>This bias reduces the accuracy and reliability of LLM outputs, limiting their usefulness in decision-critical applications.<n>We introduce two methods: Bias Node Pruning (BNP), which prunes parameters that contribute to selection bias, and Auxiliary Option Injection (AOI), which introduces an answer choice to reduce bias in both white-box and black-box settings.
arXiv Detail & Related papers (2024-09-27T15:53:54Z)
Going Beyond Popularity and Positivity Bias: Correcting for Multifactorial Bias in Recommender Systems [74.47680026838128]
Two typical forms of bias in user interaction data with recommender systems (RSs) are popularity bias and positivity bias. We consider multifactorial selection bias affected by both item and rating value factors. We propose smoothing and alternating gradient descent techniques to reduce variance and improve the robustness of its optimization.
arXiv Detail & Related papers (2024-04-29T12:18:21Z)
Improving Bias Mitigation through Bias Experts in Natural Language Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model. Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z)
Large Language Models Are Not Robust Multiple Choice Selectors [117.72712117510953]
Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs) This work shows that modern LLMs are vulnerable to option position changes due to their inherent "selection bias" We propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution.
arXiv Detail & Related papers (2023-09-07T17:44:56Z)
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy. We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples. Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z)
A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms [19.86635585740634]
We present the conceptual idea and a first implementation of a bias-injection sandbox tool to investigate fairness consequences of various biases. Unlike existing toolkits, ours provides a controlled environment to counterfactually inject biases in the ML pipeline. In particular, we can test whether a given remedy can alleviate the injected bias by comparing the predictions resulting after the intervention with true labels in the unbiased regime-that is, before any bias injection.
arXiv Detail & Related papers (2022-04-21T16:12:19Z)
Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals. We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.