When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection
- URL: http://arxiv.org/abs/2511.04643v1
- Date: Thu, 06 Nov 2025 18:35:45 GMT
- Title: When retrieval outperforms generation: Dense evidence retrieval for scalable fake news detection
- Authors: Alamgir Munir Qazi, John P. McCrae, Jamal Abdul Nasir,
- Abstract summary: DeReC is a lightweight framework that demonstrates how general-purpose text embeddings can effectively replace autoregressive LLM-based approaches in fact verification tasks.<n>By combining dense retrieval with specialized classification, our system achieves better accuracy while being significantly more efficient.<n>Our results demonstrate that carefully engineered retrieval-based systems can match or exceed LLM performance in specialized tasks.
- Score: 1.329253775274691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of misinformation necessitates robust yet computationally efficient fact verification systems. While current state-of-the-art approaches leverage Large Language Models (LLMs) for generating explanatory rationales, these methods face significant computational barriers and hallucination risks in real-world deployments. We present DeReC (Dense Retrieval Classification), a lightweight framework that demonstrates how general-purpose text embeddings can effectively replace autoregressive LLM-based approaches in fact verification tasks. By combining dense retrieval with specialized classification, our system achieves better accuracy while being significantly more efficient. DeReC outperforms explanation-generating LLMs in efficiency, reducing runtime by 95% on RAWFC (23 minutes 36 seconds compared to 454 minutes 12 seconds) and by 92% on LIAR-RAW (134 minutes 14 seconds compared to 1692 minutes 23 seconds), showcasing its effectiveness across varying dataset sizes. On the RAWFC dataset, DeReC achieves an F1 score of 65.58%, surpassing the state-of-the-art method L-Defense (61.20%). Our results demonstrate that carefully engineered retrieval-based systems can match or exceed LLM performance in specialized tasks while being significantly more practical for real-world deployment.
Related papers
- Efficient Jailbreak Mitigation Using Semantic Linear Classification in a Multi-Staged Pipeline [1.2802720336459552]
Prompt injection and jailbreaking attacks pose persistent security challenges to large language model (LLM)-based systems.<n>We present an efficient and systematically evaluated defense architecture that mitigates these threats through a lightweight, multi-stage pipeline.
arXiv Detail & Related papers (2025-12-22T04:00:35Z) - LLM Optimization Unlocks Real-Time Pairwise Reranking [6.0141312590967635]
Pairwise Reranking Prompting (PRP) has emerged as a promising plug-and-play approach due to its usability and effectiveness.<n>This paper presents a focused study on pairwise reranking, demonstrating that carefully applied optimization methods can significantly mitigate these issues.<n>We achieve a remarkable latency reduction of up to 166 times, from 61.36 seconds to 0.37 seconds per query, with an insignificant drop in performance measured by Recall@k.
arXiv Detail & Related papers (2025-11-10T19:04:41Z) - Advanced Multi-Architecture Deep Learning Framework for BIRADS-Based Mammographic Image Retrieval: Comprehensive Performance Analysis with Super-Ensemble Optimization [0.0]
mammographic image retrieval systems require exact BIRADS categorical matching across five distinct classes.<n>Current medical image retrieval studies suffer from methodological limitations.
arXiv Detail & Related papers (2025-08-06T18:05:18Z) - Phishing Detection in the Gen-AI Era: Quantized LLMs vs Classical Models [1.4999444543328293]
Phishing attacks are becoming increasingly sophisticated, underscoring the need for detection systems that strike a balance between high accuracy and computational efficiency.<n>This paper presents a comparative evaluation of traditional Machine Learning (ML), Deep Learning (DL), and quantized small- parameter Large Language Models (LLMs) for phishing detection.<n>We show that while LLMs currently underperform compared to ML and DL methods in terms of raw accuracy, they exhibit strong potential for identifying subtle, context-based phishing cues.
arXiv Detail & Related papers (2025-07-10T04:01:52Z) - CiteFix: Enhancing RAG Accuracy Through Post-Processing Citation Correction [0.2548904650574671]
Retrieval Augmented Generation (RAG) has emerged as a powerful application of Large Language Models (LLMs)<n>This research contributes to enhancing the reliability and trustworthiness of AI-generated content in information retrieval and summarization tasks.
arXiv Detail & Related papers (2025-04-22T06:41:25Z) - LLM2: Let Large Language Models Harness System 2 Reasoning [65.89293674479907]
Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs.<n>We introduce LLM2, a novel framework that combines an LLM with a process-based verifier.<n>LLMs2 is responsible for generating plausible candidates, while the verifier provides timely process-based feedback to distinguish desirable and undesirable outputs.
arXiv Detail & Related papers (2024-12-29T06:32:36Z) - Evaluating the Effectiveness and Efficiency of Demonstration Retrievers in RAG for Coding Tasks [6.34946724864899]
This paper systematically evaluates the efficiency-effectiveness trade-off of retrievers across three coding tasks.<n>We show that while BM25 excels in effectiveness, it suffers in efficiency as the knowledge base grows beyond 1000 entries.<n>In large-scale retrieval, efficiency differences become more pronounced, with approximate dense retrievers offering the greatest gains.
arXiv Detail & Related papers (2024-10-12T22:31:01Z) - FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models [50.331708897857574]
We introduce FactorLLM, a novel approach that decomposes well-trained dense FFNs into sparse sub-networks without requiring any further modifications.
FactorLLM achieves comparable performance to the source model securing up to 85% model performance while obtaining over a 30% increase in inference speed.
arXiv Detail & Related papers (2024-08-15T16:45:16Z) - Bridging LLMs and KGs without Fine-Tuning: Intermediate Probing Meets Subgraph-Aware Entity Descriptions [49.36683223327633]
Large Language Models (LLMs) encapsulate extensive world knowledge and exhibit powerful context modeling capabilities.<n>We propose a novel framework that synergizes the strengths of LLMs with robust knowledge representation to enable effective and efficient KGC.<n>We achieve a 47% relative improvement over previous methods based on non-fine-tuned LLMs and, to our knowledge, are the first to achieve classification performance comparable to fine-tuned LLMs.
arXiv Detail & Related papers (2024-08-13T10:15:55Z) - Text Quality-Based Pruning for Efficient Training of Language Models [66.66259229732121]
We propose a novel method for numerically evaluating text quality in large unlabelled NLP datasets.
By proposing the text quality metric, the paper establishes a framework to identify and eliminate low-quality text instances.
Experimental results over multiple models and datasets demonstrate the efficacy of this approach.
arXiv Detail & Related papers (2024-04-26T18:01:25Z) - LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text
Retrieval [55.097573036580066]
Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models.
Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.
arXiv Detail & Related papers (2022-03-11T18:53:12Z) - Finding Action Tubes with a Sparse-to-Dense Framework [62.60742627484788]
We propose a framework that generates action tube proposals from video streams with a single forward pass in a sparse-to-dense manner.
We evaluate the efficacy of our model on the UCF101-24, JHMDB-21 and UCFSports benchmark datasets.
arXiv Detail & Related papers (2020-08-30T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.