Misleading through Inconsistency: A Benchmark for Political Inconsistencies Detection
- URL: http://arxiv.org/abs/2505.19191v1
- Date: Sun, 25 May 2025 15:35:24 GMT
- Title: Misleading through Inconsistency: A Benchmark for Political Inconsistencies Detection
- Authors: Nursulu Sagimbayeva, Ruveyda Betül Bahçeci, Ingmar Weber,
- Abstract summary: Inconsistent political statements represent a form of misinformation.<n>We propose the Inconsistency detection task and develop a scale of inconsistency types to prompt NLP-research in this direction.<n>We present a dataset of 698 human-annotated pairs of political statements with explanations of the annotators' reasoning for 237 samples.
- Score: 0.7373617024876725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inconsistent political statements represent a form of misinformation. They erode public trust and pose challenges to accountability, when left unnoticed. Detecting inconsistencies automatically could support journalists in asking clarification questions, thereby helping to keep politicians accountable. We propose the Inconsistency detection task and develop a scale of inconsistency types to prompt NLP-research in this direction. To provide a resource for detecting inconsistencies in a political domain, we present a dataset of 698 human-annotated pairs of political statements with explanations of the annotators' reasoning for 237 samples. The statements mainly come from voting assistant platforms such as Wahl-O-Mat in Germany and Smartvote in Switzerland, reflecting real-world political issues. We benchmark Large Language Models (LLMs) on our dataset and show that in general, they are as good as humans at detecting inconsistencies, and might be even better than individual humans at predicting the crowd-annotated ground-truth. However, when it comes to identifying fine-grained inconsistency types, none of the model have reached the upper bound of performance (due to natural labeling variation), thus leaving room for improvement. We make our dataset and code publicly available.
Related papers
- Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z) - Analyzing Political Bias in LLMs via Target-Oriented Sentiment Classification [4.352835414206441]
Political biases encoded by LLMs might have detrimental effects on downstream applications.<n>We propose a new approach leveraging the observation that LLM sentiment predictions vary with the target entity in the same sentence.<n>We insert 1319 demographically and politically diverse politician names in 450 political sentences and predict target-oriented sentiment using seven models in six widely spoken languages.
arXiv Detail & Related papers (2025-05-26T10:01:24Z) - Only a Little to the Left: A Theory-grounded Measure of Political Bias in Large Language Models [4.8869340671593475]
Political bias in prompt-based language models can affect their performance.<n>We build on survey design principles to test a wide variety of input prompts, while taking into account prompt sensitivity.<n>Measures of political bias are often unstable, but generally more left-leaning for instruction-tuned models.
arXiv Detail & Related papers (2025-03-20T13:51:06Z) - The Impact of Persona-based Political Perspectives on Hateful Content Detection [4.04666623219944]
Politically diverse language models require computational resources often inaccessible to many researchers and organizations.<n>Recent work has established that persona-based prompting can introduce political diversity in model outputs without additional training.<n>We investigate whether such prompting strategies can achieve results comparable to political pretraining for downstream tasks.
arXiv Detail & Related papers (2025-02-01T09:53:17Z) - Few-shot Policy (de)composition in Conversational Question Answering [54.259440408606515]
We propose a neuro-symbolic framework to detect policy compliance using large language models (LLMs) in a few-shot setting.<n>We show that our approach soundly reasons about policy compliance conversations by extracting sub-questions to be answered, assigning truth values from contextual information, and explicitly producing a set of logic statements from the given policies.<n>We apply this approach to the popular PCD and conversational machine reading benchmark, ShARC, and show competitive performance with no task-specific finetuning.
arXiv Detail & Related papers (2025-01-20T08:40:15Z) - Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models.
Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z) - Entity-Based Evaluation of Political Bias in Automatic Summarization [27.68439481274954]
We use an entity replacement method to investigate the portrayal of politicians in automatically generated summaries of news articles.
We develop an entity-based computational framework to assess the sensitivities of several extractive and abstractive summarizers to the politicians Donald Trump and Joe Biden.
arXiv Detail & Related papers (2023-05-03T17:59:59Z) - We're Afraid Language Models Aren't Modeling Ambiguity [136.8068419824318]
Managing ambiguity is a key part of human language understanding.
We characterize ambiguity in a sentence by its effect on entailment relations with another sentence.
We show that a multilabel NLI model can flag political claims in the wild that are misleading due to ambiguity.
arXiv Detail & Related papers (2023-04-27T17:57:58Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - A Machine Learning Pipeline to Examine Political Bias with Congressional
Speeches [0.3062386594262859]
We give machine learning approaches to study political bias in two ideologically diverse social media forums: Gab and Twitter.
Our proposed methods exploit the use of transcripts collected from political speeches in US congress to label the data.
We also present a machine learning approach that combines features from cascades and text to forecast cascade's political bias with an accuracy of about 85%.
arXiv Detail & Related papers (2021-09-18T21:15:21Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.