Related papers: Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance Judgments via Personality Infusing

Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance Judgments via Personality Infusing

URL: http://arxiv.org/abs/2512.00390v1
Date: Sat, 29 Nov 2025 08:37:51 GMT
Title: Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance Judgments via Personality Infusing
Authors: Nuo Chen, Hanpei Fang, Jiqun Liu, Wilson Wei, Tetsuya Sakai, Xiao-Ming Wu,
Abstract summary: We investigate how Big Five personality profiles in LLMs influence priming in relevance labeling.<n>Our results show that certain profiles, such as High Openness and Low Neuroticism, consistently reduce priming susceptibility.<n>The most effective personality in mitigating priming may vary across models and task types.
Score: 25.77984485421331
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent research has explored LLMs as scalable tools for relevance labeling, but studies indicate they are susceptible to priming effects, where prior relevance judgments influence later ones. Although psychological theories link personality traits to such biases, it is unclear whether simulated personalities in LLMs exhibit similar effects. We investigate how Big Five personality profiles in LLMs influence priming in relevance labeling, using multiple LLMs on TREC 2021 and 2022 Deep Learning Track datasets. Our results show that certain profiles, such as High Openness and Low Neuroticism, consistently reduce priming susceptibility. Additionally, the most effective personality in mitigating priming may vary across models and task types. Based on these findings, we propose personality prompting as a method to mitigate threshold priming, connecting psychological evidence with LLM-based evaluation practices.

Related papers

MindShift: Analyzing Language Models' Reactions to Psychological Prompts [6.696296750931842]
Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users.<n>Our study introduces MindShift, a benchmark for evaluating LLMs' psychological adaptability.
arXiv Detail & Related papers (2025-12-09T21:56:54Z)
Evaluating the Simulation of Human Personality-Driven Susceptibility to Misinformation with LLMs [0.18416014644193066]
Large language models (LLMs) make it possible to generate synthetic behavioural data at scale.<n>We evaluate the capacity of LLM agents, conditioned on Big-Five profiles, to reproduce personality-based variation in susceptibility to misinformation.
arXiv Detail & Related papers (2025-06-30T08:16:07Z)
If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs [55.8331366739144]
We introduce LIFESTATE-BENCH, a benchmark designed to assess lifelong learning in large language models (LLMs)<n>Our fact checking evaluation probes models' self-awareness, episodic memory retrieval, and relationship tracking, across both parametric and non-parametric approaches.
arXiv Detail & Related papers (2025-03-30T16:50:57Z)
Preference Leakage: A Contamination Problem in LLM-as-a-judge [69.96778498636071]
Large Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods.<n>In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators.
arXiv Detail & Related papers (2025-02-03T17:13:03Z)
Neuron-based Personality Trait Induction in Large Language Models [115.08894603023712]
Large language models (LLMs) have become increasingly proficient at simulating various personality traits. We present a neuron-based approach for personality trait induction in LLMs.
arXiv Detail & Related papers (2024-10-16T07:47:45Z)
AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment [37.985947029716016]
Large language models (LLMs) have shown advanced understanding capabilities but may inherit human biases from their training data. We investigated whether LLMs are influenced by the threshold priming effect in relevance judgments.
arXiv Detail & Related papers (2024-09-24T12:23:15Z)
Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z)
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics [29.325576963215163]
Large Language Models (LLMs) have led to their adaptation in various domains as conversational agents.<n>We introduce TRAIT, a new benchmark consisting of 8K multi-choice questions designed to assess the personality of LLMs.<n>LLMs exhibit distinct and consistent personality, which is highly influenced by their training data.
arXiv Detail & Related papers (2024-06-20T19:50:56Z)
LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model [58.887561071010985]
Personality detection aims to detect one's personality traits underlying in social media posts. Most existing methods learn post features directly by fine-tuning the pre-trained language models. We propose a large language model (LLM) based text augmentation enhanced personality detection model.
arXiv Detail & Related papers (2024-03-12T12:10:18Z)
Eliciting Personality Traits in Large Language Models [0.0]
Large Language Models (LLMs) are increasingly being utilized by both candidates and employers in the recruitment context. This study seeks to obtain a better understanding of such models by examining their output variations based on different input prompts.
arXiv Detail & Related papers (2024-02-13T10:09:00Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
UPLex: Fine-Grained Personality Control in Large Language Models via Unsupervised Lexical Modulation [52.043831554626685]
Personality is a crucial factor that shapes human communication patterns, thereby regulating the personalities of large language models (LLMs)<n>We propose UPLex, a method that uses an Unsupervisedly-Built personalized lexicon (UPL) during the decoding phase to manipulate LLM's personality traits.<n>UPLex can be constructed from a newly built situational judgment test dataset in an unsupervised fashion, and used to modulate the personality expression of LLMs.
arXiv Detail & Related papers (2023-10-25T12:16:33Z)
Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models. Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.