Related papers: Personality as a Probe for LLM Evaluation: Method Trade-offs and Downstream Effects

Personality as a Probe for LLM Evaluation: Method Trade-offs and Downstream Effects

URL: http://arxiv.org/abs/2509.04794v1
Date: Fri, 05 Sep 2025 04:19:15 GMT
Title: Personality as a Probe for LLM Evaluation: Method Trade-offs and Downstream Effects
Authors: Gunmay Handa, Zekun Wu, Adriano Koshiyama, Philip Treleaven,
Abstract summary: We present a systematic study of personality control using the Big Five traits.<n>Trait-level analysis shows openness as uniquely challenging, agreeableness as most resistant to ICL.<n>Experiments on Gemma-2-2B-IT and LLaMA-3-8B-Instruct reveal clear trade-offs.
Score: 0.6087817758152709
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Personality manipulation in large language models (LLMs) is increasingly applied in customer service and agentic scenarios, yet its mechanisms and trade-offs remain unclear. We present a systematic study of personality control using the Big Five traits, comparing in-context learning (ICL), parameter-efficient fine-tuning (PEFT), and mechanistic steering (MS). Our contributions are fourfold. First, we construct a contrastive dataset with balanced high/low trait responses, enabling effective steering vector computation and fair cross-method evaluation. Second, we introduce a unified evaluation framework based on within-run $\Delta$ analysis that disentangles, reasoning capability, agent performance, and demographic bias across MMLU, GAIA, and BBQ benchmarks. Third, we develop trait purification techniques to separate openness from conscientiousness, addressing representational overlap in trait encoding. Fourth, we propose a three-level stability framework that quantifies method-, trait-, and combination-level robustness, offering practical guidance under deployment constraints. Experiments on Gemma-2-2B-IT and LLaMA-3-8B-Instruct reveal clear trade-offs: ICL achieves strong alignment with minimal capability loss, PEFT delivers the highest alignment at the cost of degraded task performance, and MS provides lightweight runtime control with competitive effectiveness. Trait-level analysis shows openness as uniquely challenging, agreeableness as most resistant to ICL, and personality encoding consolidating around intermediate layers. Taken together, these results establish personality manipulation as a multi-level probe into behavioral representation, linking surface conditioning, parameter encoding, and activation-level steering, and positioning mechanistic steering as a lightweight alternative to fine-tuning for both deployment and interpretability.

Related papers

TRACE: Trajectory-Aware Comprehensive Evaluation for Deep Research Agents [51.30998248590416]
Trajectory-Aware Comprehensive Evaluation (TRACE) is a framework that holistically assesses the entire problem-solving trajectory.<n>Our contributions include the TRACE framework, its novel metrics, and the accompanying DeepResearch-Bench with controllable complexity.
arXiv Detail & Related papers (2026-02-05T13:28:57Z)
Separation-Utility Pareto Frontier: An Information-Theoretic Characterization [1.4213973379473657]
We study the optimal trade-off between utility and separation, a fairness criterion requiring predictive independence from sensitive attributes conditional on the true outcome.<n>We develop an empirical regularizer based on conditional mutual information (CMI) between predictions and sensitive attributes given the true outcome.<n>This study thus offers a provable, stable, and flexible approach to enforcing separation in deep learning.
arXiv Detail & Related papers (2026-02-04T10:38:44Z)
A Comprehensive Evaluation of LLM Reasoning: From Single-Model to Multi-Agent Paradigms [20.241519889633285]
Large Language Models (LLMs) are increasingly deployed as reasoning systems, where reasoning paradigms play a critical role.<n>We conduct a comprehensive and unified evaluation of reasoning paradigms, spanning direct single-model generation, CoT-augmented single-model reasoning, and representative MAS.<n>We introduce MIMeBench, a new open-ended benchmark that targets two foundational yet underexplored semantic capabilities.
arXiv Detail & Related papers (2026-01-19T17:23:45Z)
MAXS: Meta-Adaptive Exploration with LLM Agents [48.04723638253802]
MaxS is a meta-adaptive reasoning framework based on Large Language Model (LLM) Agents.<n>MAXS employs a lookahead strategy to extend reasoning paths a few steps ahead.<n>It combines step consistency variance and inter-step trend slopes to jointly select stable, consistent, and high-value reasoning steps.
arXiv Detail & Related papers (2026-01-14T07:48:00Z)
Rubric-Conditioned LLM Grading: Alignment, Uncertainty, and Robustness [4.129847064263056]
We systematically evaluate the performance of Large Language Models for rubric-based short-answer grading.<n>We find that alignment is strong for binary tasks but degrades with increased rubric granularity.<n>Experiments reveal that while the model is resilient to prompt injection, it is sensitive to synonym substitutions.
arXiv Detail & Related papers (2025-12-21T05:22:04Z)
CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks [96.64597365827046]
We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks.<n>We introduce a Chain-of-Thought (CoT) reasoning process in a Vision-Language Model (VLM) to bridge task heterogeneity.<n>We show our model matches or outperforms specialized SOTA methods and strong closed-source VLMs across all tasks.
arXiv Detail & Related papers (2025-11-01T04:37:01Z)
Continual Action Quality Assessment via Adaptive Manifold-Aligned Graph Regularization [53.82400605816587]
Action Quality Assessment (AQA) quantifies human actions in videos, supporting applications in sports scoring, rehabilitation, and skill evaluation.<n>A major challenge lies in the non-stationary nature of quality distributions in real-world scenarios.<n>We introduce Continual AQA (CAQA), which equips with Continual Learning capabilities to handle evolving distributions.
arXiv Detail & Related papers (2025-10-08T10:09:47Z)
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning [71.30276778807068]
We propose a unified framework that strategically coordinates sample pruning and token pruning.<n>Q-Tuning achieves a +38% average improvement over the full-data SFT baseline using only 12.5% of the original training data.
arXiv Detail & Related papers (2025-09-28T13:27:38Z)
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting [71.64063986651819]
Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) are two prominent post-training paradigms for refining the capabilities and aligning the behavior of Large Language Models (LLMs)<n>Existing approaches that integrate SFT and RL often face the risk of disrupting established model patterns and inducing overfitting to expert data.<n>We propose CHORD, a framework for the Controllable Harmonization of On- and Off-Policy Reinforcement Learning via Dynamic Weighting.
arXiv Detail & Related papers (2025-08-15T11:20:03Z)
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs [55.20230501807337]
We present the first systematic evaluation of 5 methods for improving prompt robustness within a unified experimental framework.<n>We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural Instructions dataset.
arXiv Detail & Related papers (2025-08-15T10:32:50Z)
PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning [31.42306351491176]
PASS (Probabilistic Agentic Supernet Sampling) is the first multimodal framework to address these challenges in the context of Chest X-Ray (CXR) reasoning.<n> PASS adaptively samples agentic over a multi-tool graph, yielding decision paths annotated with interpretable probabilities.
arXiv Detail & Related papers (2025-08-14T10:03:47Z)
Towards LLM Guardrails via Sparse Representation Steering [11.710399901426873]
Large Language Models (LLMs) have demonstrated remarkable performance in natural language generation tasks.<n>We propose a sparse encoding-based representation engineering method, named SRE, which decomposes polysemantic activations into a structured, monosemantic feature space.<n>By leveraging sparse autoencoding, our approach isolates and adjusts only task-specific sparse feature dimensions, enabling precise and interpretable steering of model behavior.
arXiv Detail & Related papers (2025-03-21T04:50:25Z)
The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z)
Are the Values of LLMs Structurally Aligned with Humans? A Causal Perspective [33.19778298286475]
We argue that a latent causal value graph underlies the value dimensions of large language models (LLMs) and that, despite alignment training, this structure remains significantly different from human value systems.<n>We leverage these causal value graphs to guide two lightweight value-steering methods: role-based prompting and sparse autoencoder (SAE) steering.<n>Experiments on Gemma-2B-IT and Llama3-8B-IT demonstrate the effectiveness and controllability of our methods.
arXiv Detail & Related papers (2024-12-31T18:12:05Z)
SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks [2.033441577169909]
Vision-Language Models (VLMs) have great potential in medical tasks, like Visual Question Answering (VQA)<n>Their robustness to distribution shifts on unseen data remains a key concern for safe deployment.<n>We introduce a novel framework, called SURE-VQA, centered around three key requirements to overcome current pitfalls.
arXiv Detail & Related papers (2024-11-29T13:22:52Z)
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy. We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples. Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z)
An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction [84.49035467829819]
We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective. Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
arXiv Detail & Related papers (2020-05-01T23:26:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.