Related papers: The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation

The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation

URL: http://arxiv.org/abs/2501.07849v3
Date: Tue, 03 Jun 2025 12:58:57 GMT
Title: The Invisible Hand: Unveiling Provider Bias in Large Language Models for Code Generation
Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Qingshuang Bao, Weipeng Jiang, Qian Wang, Chao Shen, Yang Liu,
Abstract summary: Large Language Models (LLMs) have emerged as the new recommendation engines.<n>We show that without explicit directives, these models show systematic preferences for services from specific providers in their recommendations.<n>We conduct the first comprehensive empirical study of provider bias in LLM code generation across seven state-of-the-art LLMs.
Score: 37.66613667849016
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have emerged as the new recommendation engines, surpassing traditional methods in both capability and scope, particularly in code generation. In this paper, we reveal a novel provider bias in LLMs: without explicit directives, these models show systematic preferences for services from specific providers in their recommendations (e.g., favoring Google Cloud over Microsoft Azure). To systematically investigate this bias, we develop an automated pipeline to construct the dataset, incorporating 6 distinct coding task categories and 30 real-world application scenarios. Leveraging this dataset, we conduct the first comprehensive empirical study of provider bias in LLM code generation across seven state-of-the-art LLMs, utilizing approximately 500 million tokens (equivalent to $5,000+ in computational costs). Our findings reveal that LLMs exhibit significant provider preferences, predominantly favoring services from Google and Amazon, and can autonomously modify input code to incorporate their preferred providers without users' requests. Such a bias holds far-reaching implications for market dynamics and societal equilibrium, potentially contributing to digital monopolies. It may also deceive users and violate their expectations, leading to various consequences. We call on the academic community to recognize this emerging issue and develop effective evaluation and mitigation methods to uphold AI security and fairness.

Related papers

State of AI: An Empirical 100 Trillion Token Study with OpenRouter [0.0]
We use the Open platform, an AI inference provider, to analyze over 100 trillion tokens of real-world LLM interactions.<n>We observe substantial adoption of open-weight models, the outsized popularity of creative roleplay, and the rise of agentic inference.<n>Our retention analysis identifies cohorts: early users whose engagement persists far longer than later cohorts.
arXiv Detail & Related papers (2026-01-15T05:28:39Z)
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
Revealing Potential Biases in LLM-Based Recommender Systems in the Cold Start Setting [41.964130989754516]
Large Language Models (LLMs) are increasingly used for recommendation tasks due to their general-purpose capabilities.<n>We introduce a benchmark specifically designed to evaluate fairness in zero-context recommendation.<n>Our modular pipeline supports recommendation domains and sensitive attributes, enabling systematic and flexible audits of any open-source LLM.
arXiv Detail & Related papers (2025-08-28T03:57:13Z)
Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z)
Learnware of Language Models: Specialized Small Language Models Can Do Big [50.285859986475394]
This paper presents a preliminary attempt to apply the learnware paradigm to language models.<n>We simulated a learnware system comprising approximately 100 learnwares of specialized SLMs with 8B parameters.<n>By selecting one suitable learnware for each task-specific inference, the system outperforms the base SLMs on all benchmarks.
arXiv Detail & Related papers (2025-05-19T17:54:35Z)
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes [20.20764453136706]
Large Language Models (LLMs) are often used as automated judges to evaluate text. We propose using linear classifying probes, trained by leveraging differences between contrasting pairs of prompts, to access latent knowledge and extract more accurate preferences.
arXiv Detail & Related papers (2025-03-22T12:35:25Z)
Agent-centric Information Access [21.876205078570507]
Large language models (LLMs) become more specialized, each trained on proprietary data and excelling in specific domains. This paper introduces a framework for agent-centric information access, where LLMs function as knowledge agents that are dynamically ranked and queried based on their demonstrated expertise. We propose a scalable evaluation framework that leverages retrieval-augmented generation and clustering techniques to construct and assess thousands of specialized models, with the potential to scale toward millions.
arXiv Detail & Related papers (2025-02-26T16:56:19Z)
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment [59.536850459059856]
We introduce MM-RLHF, a dataset containing $mathbf120k$ fine-grained, human-annotated preference comparison pairs. We propose several key innovations to improve the quality of reward models and the efficiency of alignment algorithms. Our approach is rigorously evaluated across $mathbf10$ distinct dimensions and $mathbf27$ benchmarks.
arXiv Detail & Related papers (2025-02-14T18:59:51Z)
Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
This study proposes the use of large language models (LLMs) to elicit expert prior distributions for predictive models.<n>Our findings show that LLM-elicited prior parameter distributions significantly reduce predictive error compared to uninformative priors in low-data settings.<n>Prior elicitation also consistently outperforms and proves more reliable than in-context learning at a lower cost.
arXiv Detail & Related papers (2024-11-26T10:13:39Z)
Learning to Predict Usage Options of Product Reviews with LLM-Generated Labels [14.006486214852444]
We propose a method of using LLMs as few-shot learners for annotating data in a complex natural language task. Learning a custom model offers individual control over energy efficiency and privacy measures. We find that the quality of the resulting data exceeds the level attained by third-party vendor services.
arXiv Detail & Related papers (2024-10-16T11:34:33Z)
Large Language Models, and LLM-Based Agents, Should Be Used to Enhance the Digital Public Sphere [6.171497648710294]
We argue that large language model-based recommenders can displace today's attention-allocation machinery.<n>They would ingest open-web content, infer a user's natural-injection goals, and present information that matches their reflective preferences.
arXiv Detail & Related papers (2024-10-15T23:51:04Z)
LLM-based Weak Supervision Framework for Query Intent Classification in Video Search [6.519428288229856]
We introduce a novel approach that leverages large language models (LLMs) through weak supervision to automatically annotate a vast collection of user search queries. By incorporating domain knowledge via Chain of Thought and In-Context Learning, our approach leverages the labeled data to train low-latency models optimized for real-time inference.
arXiv Detail & Related papers (2024-09-13T15:47:50Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights. We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs. Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z)
Conversational Factor Information Retrieval Model (ConFIRM) [2.855224352436985]
Conversational Factor Information Retrieval Method (ConFIRM) is a novel approach to fine-tuning large language models (LLMs) for domain-specific retrieval tasks. We demonstrate ConFIRM's effectiveness through a case study in the finance sector, fine-tuning a Llama-2-7b model using personality-aligned data. The resulting model achieved 91% accuracy in classifying financial queries, with an average inference time of 0.61 seconds on an NVIDIA A100 GPU.
arXiv Detail & Related papers (2023-10-06T12:31:05Z)
A Survey on Large Language Models for Recommendation [77.91673633328148]
Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) This survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec)
arXiv Detail & Related papers (2023-05-31T13:51:26Z)
Assessing Hidden Risks of LLMs: An Empirical Study on Robustness, Consistency, and Credibility [37.682136465784254]
We conduct over a million queries to the mainstream large language models (LLMs) including ChatGPT, LLaMA, and OPT. We find that ChatGPT is still capable to yield the correct answer even when the input is polluted at an extreme level. We propose a novel index associated with a dataset that roughly decides the feasibility of using such data for LLM-involved evaluation.
arXiv Detail & Related papers (2023-05-15T15:44:51Z)
Learning to Rank in the Position Based Model with Bandit Feedback [3.9121134770873742]
We propose novel extensions of two well-known algorithms viz. LinUCB and Linear Thompson Sampling to the ranking use-case. To account for the biases in a production environment, we employ the position-based click model.
arXiv Detail & Related papers (2020-04-27T19:12:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.