Related papers: Validating Political Position Predictions of Arguments

Validating Political Position Predictions of Arguments

URL: http://arxiv.org/abs/2602.18351v1
Date: Fri, 20 Feb 2026 17:03:44 GMT
Title: Validating Political Position Predictions of Arguments
Authors: Jordan Robinson, Angus R. Williams, Katie Atkinson, Anthony G. Cohn,
Abstract summary: We construct a large-scale knowledge base of political position predictions using 22 language models.<n>Pointwise evaluation shows moderate human-model agreement, reflecting intrinsic subjectivity.<n>Pairwise validation reveals substantially stronger alignment between human- and model-derived rankings.
Score: 3.3571381688392488
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world knowledge representation often requires capturing subjective, continuous attributes -- such as political positions -- that conflict with pairwise validation, the widely accepted gold standard for human evaluation. We address this challenge through a dual-scale validation framework applied to political stance prediction in argumentative discourse, combining pointwise and pairwise human annotation. Using 22 language models, we construct a large-scale knowledge base of political position predictions for 23,228 arguments drawn from 30 debates that appeared on the UK politicial television programme \textit{Question Time}. Pointwise evaluation shows moderate human-model agreement (Krippendorff's $α=0.578$), reflecting intrinsic subjectivity, while pairwise validation reveals substantially stronger alignment between human- and model-derived rankings ($α=0.86$ for the best model). This work contributes: (i) a practical validation methodology for subjective continuous knowledge that balances scalability with reliability; (ii) a validated structured argumentation knowledge base enabling graph-based reasoning and retrieval-augmented generation in political domains; and (iii) evidence that ordinal structure can be extracted from pointwise language models predictions from inherently subjective real-world discourse, advancing knowledge representation capabilities for domains where traditional symbolic or categorical approaches are insufficient.

Related papers

Dissecting Subjectivity and the "Ground Truth" Illusion in Data Annotation [23.545262620377887]
In machine learning, "ground truth" refers to the assumed correct labels used to train and evaluate models.<n>This systematic literature review analyzes research published between 2020 and 2025 across seven premier venues.
arXiv Detail & Related papers (2026-02-11T19:45:17Z)
Beyond Consensus: Perspectivist Modeling and Evaluation of Annotator Disagreement in NLP [25.097081181685613]
Annotator disagreement is widespread in NLP, particularly for subjective and ambiguous tasks such as toxicity detection and stance analysis.<n>We first present a domain-agnostic taxonomy of the sources of disagreement spanning data, task, and annotator factors.<n>We then synthesize modeling approaches using a common framework defined by prediction targets and pooling structure.
arXiv Detail & Related papers (2026-01-14T01:26:29Z)
On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation [88.77441715819366]
Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content.<n>We propose a variety of likelihood- and generative-based evaluation methods that serve in place of naive global token perplexity.
arXiv Detail & Related papers (2026-01-09T22:01:56Z)
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation. We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge. Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z)
Political Bias in LLMs: Unaligned Moral Values in Agent-centric Simulations [0.0]
We investigate how personalized language models align with human responses on the Moral Foundation Theory Questionnaire.<n>We adapt open-source generative language models to different political personas and repeatedly survey these models to generate synthetic data sets.<n>Our analysis reveals that models produce inconsistent results across multiple repetitions, yielding high response variance.
arXiv Detail & Related papers (2024-08-21T08:20:41Z)
Regularized Conventions: Equilibrium Computation as a Model of Pragmatic Reasoning [72.21876989058858]
We present a model of pragmatic language understanding, where utterances are produced and understood by searching for regularized equilibria of signaling games. In this model speakers and listeners search for contextually appropriate utterance--meaning mappings that are both close to game-theoretically optimal conventions and close to a shared, ''default'' semantics.
arXiv Detail & Related papers (2023-11-16T09:42:36Z)
A Unifying Framework for Learning Argumentation Semantics [47.84663434179473]
We present a novel framework, which uses an Inductive Logic Programming approach to learn the acceptability semantics for several abstract and structured argumentation frameworks in an interpretable way.<n>Our framework outperforms existing argumentation solvers, thus opening up new future research directions in the area of formal argumentation and human-machine dialogues.
arXiv Detail & Related papers (2023-10-18T20:18:05Z)
Modeling Appropriate Language in Argumentation [34.90028129715041]
We operationalize appropriate language in argumentation for the first time. We derive a new taxonomy of 14 dimensions that determine inappropriate language in online discussions.
arXiv Detail & Related papers (2023-05-24T09:17:05Z)
Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models. We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z)
A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.<n>We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z)
On the Use of Linguistic Features for the Evaluation of Generative Dialogue Systems [17.749995931459136]
We propose that a metric based on linguistic features may be able to maintain good correlation with human judgment and be interpretable. To support this proposition, we measure and analyze various linguistic features on dialogues produced by multiple dialogue models. We find that the features' behaviour is consistent with the known properties of the models tested, and is similar across domains.
arXiv Detail & Related papers (2021-04-13T16:28:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.