Related papers: Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs)

Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs)

URL: http://arxiv.org/abs/2505.21091v3
Date: Mon, 23 Jun 2025 06:43:45 GMT
Title: Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs)
Authors: Anna Neumann, Elisabeth Kirsten, Muhammad Bilal Zafar, Jatinder Singh,
Abstract summary: System prompts in Large Language Models (LLMs) are predefined directives that guide model behaviour.<n>LLMs deployers increasingly use them to ensure consistent responses across contexts.<n>As system prompts become more complex, they can directly or indirectly introduce unaccounted for side effects.
Score: 7.71667852309443
License: http://creativecommons.org/licenses/by/4.0/
Abstract: System prompts in Large Language Models (LLMs) are predefined directives that guide model behaviour, taking precedence over user inputs in text processing and generation. LLM deployers increasingly use them to ensure consistent responses across contexts. While model providers set a foundation of system prompts, deployers and third-party developers can append additional prompts without visibility into others' additions, while this layered implementation remains entirely hidden from end-users. As system prompts become more complex, they can directly or indirectly introduce unaccounted for side effects. This lack of transparency raises fundamental questions about how the position of information in different directives shapes model outputs. As such, this work examines how the placement of information affects model behaviour. To this end, we compare how models process demographic information in system versus user prompts across six commercially available LLMs and 50 demographic groups. Our analysis reveals significant biases, manifesting in differences in user representation and decision-making scenarios. Since these variations stem from inaccessible and opaque system-level configurations, they risk representational, allocative and potential other biases and downstream harms beyond the user's ability to detect or correct. Our findings draw attention to these critical issues, which have the potential to perpetuate harms if left unexamined. Further, we argue that system prompt analysis must be incorporated into AI auditing processes, particularly as customisable system prompts become increasingly prevalent in commercial AI deployments.

Related papers

Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation [3.032542495872679]
We investigate the impact of instruction-tuning on large language models' susceptibility to misinformation.<n>Our analysis reveals that instruction-tuned LLMs are significantly more likely to accept misinformation when it is presented by the user.
arXiv Detail & Related papers (2025-07-24T08:58:47Z)
Interactive Reasoning: Visualizing and Controlling Chain-of-Thought Reasoning in Large Language Models [54.85405423240165]
We introduce Interactive Reasoning, an interaction design that visualizes chain-of-thought outputs as a hierarchy of topics.<n>We implement interactive reasoning in Hippo, a prototype for AI-assisted decision making in the face of uncertain trade-offs.
arXiv Detail & Related papers (2025-06-30T10:00:43Z)
Investigating the Effects of Cognitive Biases in Prompts on Large Language Model Outputs [3.7302076138352205]
This paper investigates the influence of cognitive biases on Large Language Models (LLMs) outputs.<n> cognitive biases, such as confirmation and availability biases, can distort user inputs through prompts.
arXiv Detail & Related papers (2025-06-14T04:18:34Z)
Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z)
Reading Between the Prompts: How Stereotypes Shape LLM's Implicit Personalization [6.781972039785424]
Generative Large Language Models (LLMs) infer user's demographic information from subtle cues in the conversation.<n>Our results highlight the need for greater transparency and control in how LLMs represent user identity.
arXiv Detail & Related papers (2025-05-22T09:48:51Z)
A Closer Look at System Prompt Robustness [2.5525497052179995]
Developers depend on system prompts to specify important context, output format, personalities, guardrails, content policies, and safety countermeasures.<n>In practice, models often forget to consider relevant guardrails or fail to resolve conflicting demands between the system and the user.<n>We create realistic new evaluation and fine-tuning datasets based on prompts collected from OpenAI's GPT Store and HuggingFace's HuggingChat.
arXiv Detail & Related papers (2025-02-15T18:10:45Z)
Predicting the Performance of Black-box LLMs through Self-Queries [60.87193950962585]
Large language models (LLMs) are increasingly relied on in AI systems, predicting when they make mistakes is crucial.<n>In this paper, we extract features of LLMs in a black-box manner by using follow-up prompts and taking the probabilities of different responses as representations.<n>We demonstrate that training a linear model on these low-dimensional representations produces reliable predictors of model performance at the instance level.
arXiv Detail & Related papers (2025-01-02T22:26:54Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
On the Loss of Context-awareness in General Instruction Fine-tuning [101.03941308894191]
We investigate the loss of context awareness after supervised fine-tuning.<n>We find that the performance decline is associated with a bias toward different roles learned during conversational instruction fine-tuning.<n>We propose a metric to identify context-dependent examples from general instruction fine-tuning datasets.
arXiv Detail & Related papers (2024-11-05T00:16:01Z)
How Susceptible are LLMs to Influence in Prompts? [6.644673474240519]
Large Language Models (LLMs) are highly sensitive to prompts, including additional context provided therein. We study how an LLM's response to multiple-choice questions changes when the prompt includes a prediction and explanation from another model. Our findings reveal that models are strongly influenced, and when explanations are provided they are swayed irrespective of the quality of the explanation.
arXiv Detail & Related papers (2024-08-17T17:40:52Z)
Revealing User Familiarity Bias in Task-Oriented Dialogue via Interactive Evaluation [17.41434948048325]
We conduct an interactive user study to unveil how vulnerable TOD systems are against realistic scenarios. Our study reveals that conversations in open-goal settings lead to catastrophic failures of the system. We discover a novel "pretending" behavior, in which the system pretends to handle the user requests even though they are beyond the system's capabilities.
arXiv Detail & Related papers (2023-05-23T09:24:53Z)
Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings [56.93025161787725]
Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing local data. We propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters. We show that the attribute inference attack is achievable for SER systems trained using FL.
arXiv Detail & Related papers (2021-12-26T16:50:42Z)
Explainable Recommender Systems via Resolving Learning Representations [57.24565012731325]
Explanations could help improve user experience and discover system defects. We propose a novel explainable recommendation model through improving the transparency of the representation learning process.
arXiv Detail & Related papers (2020-08-21T05:30:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.