Related papers: Reading Between the Tokens: Improving Preference Predictions through Mechanistic Forecasting

Reading Between the Tokens: Improving Preference Predictions through Mechanistic Forecasting

URL: http://arxiv.org/abs/2602.02882v1
Date: Mon, 02 Feb 2026 22:39:06 GMT
Title: Reading Between the Tokens: Improving Preference Predictions through Mechanistic Forecasting
Authors: Sarah Ball, Simeon Allmendinger, Niklas Kühl, Frauke Kreuter,
Abstract summary: We investigate how demographic and ideological information activates latent party-encoding components within large language models.<n>We find that leveraging this internal knowledge via mechanistic forecasting can improve prediction accuracy.
Score: 8.075670640219784
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Large language models are increasingly used to predict human preferences in both scientific and business endeavors, yet current approaches rely exclusively on analyzing model outputs without considering the underlying mechanisms. Using election forecasting as a test case, we introduce mechanistic forecasting, a method that demonstrates that probing internal model representations offers a fundamentally different - and sometimes more effective - approach to preference prediction. Examining over 24 million configurations across 7 models, 6 national elections, multiple persona attributes, and prompt variations, we systematically analyze how demographic and ideological information activates latent party-encoding components within the respective models. We find that leveraging this internal knowledge via mechanistic forecasting (opposed to solely relying on surface-level predictions) can improve prediction accuracy. The effects vary across demographic versus opinion-based attributes, political parties, national contexts, and models. Our findings demonstrate that the latent representational structure of LLMs contains systematic, exploitable information about human preferences, establishing a new path for using language models in social science prediction tasks.

Related papers

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors [61.92704516732144]
We show that the most robust features for correctness prediction are those that play a distinctive causal role in the model's behavior.<n>We propose two methods that leverage causal mechanisms to predict the correctness of model outputs.
arXiv Detail & Related papers (2025-05-17T00:31:39Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language [35.84181171987974]
Our goal is to build a regression model that can process numerical data and make probabilistic predictions at arbitrary locations.<n>We start by exploring strategies for eliciting explicit, coherent numerical predictive distributions from Large Language Models.<n>We demonstrate the ability to usefully incorporate text into numerical predictions, improving predictive performance and giving quantitative structure that reflects qualitative descriptions.
arXiv Detail & Related papers (2024-05-21T15:13:12Z)
Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes [72.13373216644021]
We study the societal impact of machine learning by considering the collection of models that are deployed in a given context. We find deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
arXiv Detail & Related papers (2023-07-12T01:11:52Z)
Explaining Hate Speech Classification with Model Agnostic Methods [0.9990687944474738]
The research goal of this paper is to bridge the gap between hate speech prediction and the explanations generated by the system to support its decision. This has been achieved by first predicting the classification of a text and then providing a posthoc, model agnostic and surrogate interpretability approach.
arXiv Detail & Related papers (2023-05-30T19:52:56Z)
What Should I Know? Using Meta-gradient Descent for Predictive Feature Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations. An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making. We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z)
Building Interpretable Models for Business Process Prediction using Shared and Specialised Attention Mechanisms [5.607831842909669]
We address the "black-box" problem in predictive process analytics by building interpretable models. We propose two types of attentions: event attention to capture the impact of specific process events on a prediction, and attribute attention to reveal which attribute(s) of an event influenced the prediction.
arXiv Detail & Related papers (2021-09-03T10:17:05Z)
Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters. We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z)
Forethought and Hindsight in Credit Assignment [62.05690959741223]
We work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated.
arXiv Detail & Related papers (2020-10-26T16:00:47Z)
Introduction to Rare-Event Predictive Modeling for Inferential Statisticians -- A Hands-On Application in the Prediction of Breakthrough Patents [0.0]
We introduce a machine learning (ML) approach to quantitative analysis geared towards optimizing the predictive performance. We discuss the potential synergies between the two fields against the backdrop of this, at first glance, target-incompatibility. We are providing a hands-on predictive modeling introduction for a quantitative social science audience while aiming at demystifying computer science jargon.
arXiv Detail & Related papers (2020-03-30T13:06:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.