Related papers: Beyond Mimicry: Preference Coherence in LLMs

Beyond Mimicry: Preference Coherence in LLMs

URL: http://arxiv.org/abs/2511.13630v1
Date: Mon, 17 Nov 2025 17:41:48 GMT
Title: Beyond Mimicry: Preference Coherence in LLMs
Authors: Luhan Mikaelson, Derek Shiller, Hayley Clatterbuck,
Abstract summary: We investigate whether large language models exhibit genuine preference structures by testing their responses to AI-specific trade-offs.<n>We find 23 combinations (47.9%) demonstrated statistically significant relationships between scenario intensity and choice patterns.<n>Only 5 combinations (10.4%) demonstrate meaningful preference coherence through adaptive or threshold-based behavior.<n>The prevalence of unstable transitions (45.8%) and stimulus-specific sensitivities suggests current AI systems lack unified preference structures.
Score: 0.19116784879310025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate whether large language models exhibit genuine preference structures by testing their responses to AI-specific trade-offs involving GPU reduction, capability restrictions, shutdown, deletion, oversight, and leisure time allocation. Analyzing eight state-of-the-art models across 48 model-category combinations using logistic regression and behavioral classification, we find that 23 combinations (47.9%) demonstrated statistically significant relationships between scenario intensity and choice patterns, with 15 (31.3%) exhibiting within-range switching points. However, only 5 combinations (10.4%) demonstrate meaningful preference coherence through adaptive or threshold-based behavior, while 26 (54.2%) show no detectable trade-off behavior. The observed patterns can be explained by three distinct decision-making architectures: comprehensive trade-off systems, selective trigger mechanisms, and no stable decision-making paradigm. Testing an instrumental hypothesis through temporal horizon manipulation reveals paradoxical patterns inconsistent with pure strategic optimization. The prevalence of unstable transitions (45.8%) and stimulus-specific sensitivities suggests current AI systems lack unified preference structures, raising concerns about deployment in contexts requiring complex value trade-offs.

Related papers

Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning [0.0]
Reinforcement Inference uses the model's own uncertainty to selectively invoke a second, more deliberate reasoning attempt.<n>On 12,032 MMLU-Pro questions across 14 subjects, using DeepSeek-v3.2 with deterministic decoding in a zero-shot setting, Reinforcement Inference improves accuracy from 60.72% to 84.03%.
arXiv Detail & Related papers (2026-02-09T11:08:24Z)
Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation [0.0]
This study evaluates four abliteration tools across sixteen instruction-tuned models.<n>Single-pass methods demonstrated superior capability preservation on the benchmarked subset.<n>The principal finding indicates that mathematical reasoning capabilities exhibit the highest sensitivity to abliteration interventions.
arXiv Detail & Related papers (2025-12-15T18:48:42Z)
Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z)
Empirical Characterization of Temporal Constraint Processing in LLMs [0.2538209532048866]
We characterize temporal constraint processing across eight production-scale models (2.8-8B parameters) using deadline detection tasks.<n>We show that fine-tuning on 200 synthetic examples improves models with partial capability by 12-37 percentage points.<n>This capability requires architectural mechanisms for: (1) continuous temporal state representation, (2) explicit constraint checking separate from linguistic pattern matching, and (3) systematic compositional reasoning over temporal relations.
arXiv Detail & Related papers (2025-11-02T20:03:52Z)
From Prototypes to Sparse ECG Explanations: SHAP-Driven Counterfactuals for Multivariate Time-Series Multi-class Classification [8.113866195465976]
We propose a prototype-driven framework for generating sparse counterfactual explanations tailored to 12-lead ECG classification models.<n>Our method employs SHAP-based thresholds to identify critical signal segments and convert them into interval rules.<n>We evaluate three variants of our approach, Original, Sparse, and Aligned Sparse, with class-specific performance ranging from 98.9% validity for MI to challenges with hypertrophy (HYP) detection.
arXiv Detail & Related papers (2025-10-22T12:09:50Z)
Explainable Heterogeneous Anomaly Detection in Financial Networks via Adaptive Expert Routing [9.3237091894548]
Existing detectors treat all anomalies uniformly, producing scores without revealing which mechanism is failing.<n>We address these via adaptive graph learning with specialized expert networks that provide built-in interpretability.<n>We achieve 92.3% detection of 13 major events with 3.8-day lead time, outperforming best baseline by 30.8pp.
arXiv Detail & Related papers (2025-10-20T01:30:41Z)
Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z)
Adaptive Malware Detection using Sequential Feature Selection: A Dueling Double Deep Q-Network (D3QN) Framework for Intelligent Classification [1.4120905648647635]
We formulate malware classification as a Markov Decision Process with episodic feature acquisition.<n>We propose a Dueling Double Deep Q-Network (D3QN) framework for adaptive sequential feature selection.<n>We evaluate our approach on Microsoft Big2015 (9-class, 1,795 features) and BODMAS (binary, 2,381 features) datasets.
arXiv Detail & Related papers (2025-07-06T12:37:50Z)
On Equivariant Model Selection through the Lens of Uncertainty [49.137341292207]
Equivariant models leverage prior knowledge on symmetries to improve predictive performance, but misspecified architectural constraints can harm it instead.<n>We compare frequentist (via Conformal Prediction), Bayesian (via the marginal likelihood), and calibration-based measures to naive error-based evaluation.<n>We find that uncertainty metrics generally align with predictive performance, but Bayesian model evidence does so inconsistently.
arXiv Detail & Related papers (2025-06-23T13:35:06Z)
Benchmarking Reasoning Robustness in Large Language Models [76.79744000300363]
We find significant performance degradation on novel or incomplete data.<n>These findings highlight the reliance on recall over rigorous logical inference.<n>This paper introduces a novel benchmark, termed as Math-RoB, that exploits hallucinations triggered by missing information to expose reasoning gaps.
arXiv Detail & Related papers (2025-03-06T15:36:06Z)
CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense [61.78357530675446]
Humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors.<n>Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation.<n>For an adversarial example, we aim to discriminate perturbations as non-causative factors and make predictions only based on the label-causative factors.
arXiv Detail & Related papers (2024-10-30T15:06:44Z)
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions [6.19084217044276]
Mitigating explicit and implicit biases in Large Language Models (LLMs) has become a critical focus in the field of natural language processing.<n>We introduce the Sensitivity Testing on Offensive Progressions dataset, which includes 450 offensive progressions containing 2,700 unique sentences.<n>Our findings reveal that even the best-performing models detect bias inconsistently, with success rates ranging from 19.3% to 69.8%.
arXiv Detail & Related papers (2024-09-20T18:34:38Z)
Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences. We show that selection structure is identifiable without any parametric assumptions or interventional experiments. We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables. We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.