Related papers: Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset

Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset

URL: http://arxiv.org/abs/2512.18533v1
Date: Sat, 20 Dec 2025 23:08:18 GMT
Title: Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset
Authors: S Mahmudul Hasan, Shaily Roy, Akib Jawad Nafis,
Abstract summary: We present a diagnostic evaluation of nine machine learning algorithms on the LIAR benchmark.<n>We uncover a hard "Performance Ceiling", with fine-grained classification not exceeding a weighted F1-score of 0.32 across models.<n>A massive "Generalization Gap" in tree-based ensembles, which achieve more than 99% training accuracy but collapse to approximately 25% on test data.
Score: 0.764671395172401
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The proliferation of linguistically subtle political disinformation poses a significant challenge to automated fact-checking systems. Despite increasing emphasis on complex neural architectures, the empirical limits of text-only linguistic modeling remain underexplored. We present a systematic diagnostic evaluation of nine machine learning algorithms on the LIAR benchmark. By isolating lexical features (Bag-of-Words, TF-IDF) and semantic embeddings (GloVe), we uncover a hard "Performance Ceiling", with fine-grained classification not exceeding a Weighted F1-score of 0.32 across models. Crucially, a simple linear SVM (Accuracy: 0.624) matches the performance of pre-trained Transformers such as RoBERTa (Accuracy: 0.620), suggesting that model capacity is not the primary bottleneck. We further diagnose a massive "Generalization Gap" in tree-based ensembles, which achieve more than 99% training accuracy but collapse to approximately 25% on test data, indicating reliance on lexical memorization rather than semantic inference. Synthetic data augmentation via SMOTE yields no meaningful gains, confirming that the limitation is semantic (feature ambiguity) rather than distributional. These findings indicate that for political fact-checking, increasing model complexity without incorporating external knowledge yields diminishing returns.

Related papers

Predictive Coding and Information Bottleneck for Hallucination Detection in Large Language Models [0.8552050317027305]
Hallucinations in Large Language Models (LLMs) remain a critical barrier to high-stakes deployment.<n>We introduce [Model Name], a hybrid detection framework that combines neuroscience-inspired signal design with supervised machine learning.
arXiv Detail & Related papers (2026-01-22T05:00:21Z)
AI Generated Text Detection [0.0]
This paper presents an evaluation of AI text detection methods, including both traditional machine learning models and transformer-based architectures.<n>We utilize two datasets, HC3 and DAIGT v2, to build a unified benchmark and apply a topic-based data split to prevent information leakage.<n>Results indicate that contextual modeling is significantly superior to lexical features and highlight the importance of mitigating topic memorization.
arXiv Detail & Related papers (2026-01-07T11:18:10Z)
Self-Training the Neurochaos Learning Algorithm [0.0]
This study introduces a hybrid semi-supervised learning architecture that integrates Neurochaos Learning (NL) with a threshold-based Self-Training (ST) method to overcome this constraint.<n>The proposed Self-Training Neurochaos Learning (NL+ST) architecture consistently attains superior performance gain relative to standalone ST models.
arXiv Detail & Related papers (2026-01-03T10:24:01Z)
The Necessity of Imperfection:Reversing Model Collapse via Simulating Cognitive Boundedness [0.284279467589473]
This paper proposes a paradigm shift: instead of imitating the surface properties of data, we simulate the cognitive processes that generate human text.<n>We introduce the Prompt-driven Cognitive Computing Framework (PMCSF) that reverse-engineers unstructured text into structured cognitive vectors.<n>Our findings demonstrate that modelling human cognitive limitations -- not copying surface data -- enables synthetic data with genuine functional gain.
arXiv Detail & Related papers (2025-12-01T07:09:38Z)
A Theoretically Grounded Hybrid Ensemble for Reliable Detection of LLM-Generated Text [0.0]
We propose a theoretically grounded hybrid ensemble that fuses three complementary detection paradigms.<n>The core novelty lies in an optimized weighted voting framework, where ensemble weights are learned on the probability simplex to maximize F1-score.<n>Our system achieves 94.2% accuracy and an AUC of 0.978, with a 35% relative reduction in false positives on academic text.
arXiv Detail & Related papers (2025-11-27T06:42:56Z)
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research [81.04845910798387]
Generating natural language explanations for threat detections remains an open problem in cybersecurity research.<n>We present AutoMalDesc, an automated static analysis summarization framework that operates independently at scale.<n>We publish our complete dataset of more than 100K script samples, including annotated seed (0.9K) datasets, along with our methodology and evaluation framework.
arXiv Detail & Related papers (2025-11-17T13:05:25Z)
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning [77.120955854093]
We show that data diversity can be a strong predictor of generalization in language models.<n>We introduce G-Vendi, a metric that quantifies diversity via the entropy of model-induced gradients.<n>We present Prismatic Synthesis, a framework for generating diverse synthetic data.
arXiv Detail & Related papers (2025-05-26T16:05:10Z)
EVolutionary Independent DEtermiNistiC Explanation [5.127310126394387]
This paper introduces the Evolutionary Independent Deterministic Explanation (EVIDENCE) theory.<n>EVIDENCE offers a deterministic, model-independent method for extracting significant signals from black-box models.<n> Practical applications of EVIDENCE include improving diagnostic accuracy in healthcare and enhancing audio signal analysis.
arXiv Detail & Related papers (2025-01-20T12:05:14Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [49.15931834209624]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.<n>We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.<n>By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
A comprehensive comparative evaluation and analysis of Distributional Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT. The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous. We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.