BibAgent: An Agentic Framework for Traceable Miscitation Detection in Scientific Literature
- URL: http://arxiv.org/abs/2601.16993v1
- Date: Mon, 12 Jan 2026 16:30:45 GMT
- Title: BibAgent: An Agentic Framework for Traceable Miscitation Detection in Scientific Literature
- Authors: Peiran Li, Fangzhou Lin, Shuo Xing, Xiang Zheng, Xi Hong, Jiashuo Sun, Zhengzhong Tu, Chaoqun Ni,
- Abstract summary: BibAgent is a scalable, end-to-end agentic framework for automated citation verification.<n>It integrates retrieval, reasoning, and adaptive evidence aggregation, applying strategies for accessible and paywalled sources.<n>Our results demonstrate that BibAgent outperforms state-of-the-art Large Language Model (LLM) baselines in citation verification accuracy and interpretability.
- Score: 21.872874595027824
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Citations are the bedrock of scientific authority, yet their integrity is compromised by widespread miscitations: ranging from nuanced distortions to fabricated references. Systematic citation verification is currently unfeasible; manual review cannot scale to modern publishing volumes, while existing automated tools are restricted by abstract-only analysis or small-scale, domain-specific datasets in part due to the "paywall barrier" of full-text access. We introduce BibAgent, a scalable, end-to-end agentic framework for automated citation verification. BibAgent integrates retrieval, reasoning, and adaptive evidence aggregation, applying distinct strategies for accessible and paywalled sources. For paywalled references, it leverages a novel Evidence Committee mechanism that infers citation validity via downstream citation consensus. To support systematic evaluation, we contribute a 5-category Miscitation Taxonomy and MisciteBench, a massive cross-disciplinary benchmark comprising 6,350 miscitation samples spanning 254 fields. Our results demonstrate that BibAgent outperforms state-of-the-art Large Language Model (LLM) baselines in citation verification accuracy and interpretability, providing scalable, transparent detection of citation misalignments across the scientific literature.
Related papers
- CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era [51.63024682584688]
Large language models (LLMs) introduce a new risk: fabricated references that appear plausible but correspond to no real publications.<n>We present the first comprehensive benchmark and detection framework for hallucinated citations in scientific writing.<n>Our framework significantly outperforms prior methods in both accuracy and interpretability.
arXiv Detail & Related papers (2026-02-26T19:17:39Z) - CheckIfExist: Detecting Citation Hallucinations in the Era of AI-Generated Content [0.0]
"CheckIfExist" is an open-source tool designed to provide immediate verification of references against scholarly databases.<n>The proposed tool fills this gap by employing string similarity algorithms to compute multi-dimensional match confidence scores.<n>The system supports both single-reference verification and batch processing of Bib entries through a unified interface.
arXiv Detail & Related papers (2026-01-27T20:26:24Z) - OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment [63.662126457336534]
OpenNovelty is an agentic system for transparent, evidence-based novelty analysis.<n>It grounds all assessments in retrieved real papers, ensuring verifiable judgments.<n>OpenNovelty aims to empower the research community with a scalable tool that promotes fair, consistent, and evidence-backed peer review.
arXiv Detail & Related papers (2026-01-04T15:48:51Z) - Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z) - SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning [0.0]
We introduce SemanticCite, an AI-powered system that verifies citation accuracy through full-text source analysis.<n>Our approach combines multiple retrieval methods with a four-class classification system that captures nuanced claim-source relationships.<n>We contribute a comprehensive dataset of over 1,000 citations with detailed alignments, functional classifications, semantic annotations, and bibliometric metadata.
arXiv Detail & Related papers (2025-11-20T10:05:21Z) - HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis [55.2480439325792]
HySemRAG is a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG)<n>System addresses limitations in existing RAG architectures through a multi-layered approach.
arXiv Detail & Related papers (2025-08-01T20:30:42Z) - Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics [22.041561519672456]
Large language models (LLMs) often produce unsupported or unverifiable content, known as "hallucinations"
We propose a comparative evaluation framework that assesses the metric effectiveness in distinguishing citations between three-category support levels.
Our results show no single metric consistently excels across all evaluations, revealing the complexity of assessing fine-grained support.
arXiv Detail & Related papers (2024-06-21T15:57:24Z) - ALiiCE: Evaluating Positional Fine-grained Citation Generation [54.19617927314975]
We propose ALiiCE, the first automatic evaluation framework for fine-grained citation generation.
Our framework first parses the sentence claim into atomic claims via dependency analysis and then calculates citation quality at the atomic claim level.
We evaluate the positional fine-grained citation generation performance of several Large Language Models on two long-form QA datasets.
arXiv Detail & Related papers (2024-06-19T09:16:14Z) - Deep Graph Learning for Anomalous Citation Detection [55.81334139806342]
We propose a novel deep graph learning model, namely GLAD (Graph Learning for Anomaly Detection), to identify anomalies in citation networks.
Within the GLAD framework, we propose an algorithm called CPU (Citation PUrpose) to discover the purpose of citation based on citation texts.
arXiv Detail & Related papers (2022-02-23T09:05:28Z) - Deep forecasting of translational impact in medical research [1.8130872753848115]
We develop a suite of representational and discriminative mathematical models of multi-scale publication data.
We show that citations are only moderately predictive of translational impact as judged by inclusion in patents, guidelines, or policy documents.
We argue that content-based models of impact are superior in performance to conventional, citation-based measures.
arXiv Detail & Related papers (2021-10-17T19:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.