Related papers: SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents

SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents

URL: http://arxiv.org/abs/2601.03260v1
Date: Tue, 16 Dec 2025 02:53:02 GMT
Title: SciNetBench: A Relation-Aware Benchmark for Scientific Literature Retrieval Agents
Authors: Chenyang Shao, Yong Li, Fengli Xu,
Abstract summary: We propose SciNetBench, the first Scientific Network Relation-aware Benchmark for literature retrieval agents.<n>Our benchmark systematically evaluates three levels of relations: ego-centric retrieval of papers with novel knowledge structures, pair-wise identification of scholarly relationships, and path-wise reconstruction of scientific evolutionary trajectories.<n>We find that their accuracy on relation-aware retrieval tasks often falls below 20%, revealing a core shortcoming of current retrieval paradigms.
Score: 12.057215000080705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid development of AI agent has spurred the development of advanced research tools, such as Deep Research. Achieving this require a nuanced understanding of the relations within scientific literature, surpasses the scope of keyword-based or embedding-based retrieval. Existing retrieval agents mainly focus on the content-level similarities and are unable to decode critical relational dynamics, such as identifying corroborating or conflicting studies or tracing technological lineages, all of which are essential for a comprehensive literature review. Consequently, this fundamental limitation often results in a fragmented knowledge structure, misleading sentiment interpretation, and inadequate modeling of collective scientific progress. To investigate relation-aware retrieval more deeply, we propose SciNetBench, the first Scientific Network Relation-aware Benchmark for literature retrieval agents. Constructed from a corpus of over 18 million AI papers, our benchmark systematically evaluates three levels of relations: ego-centric retrieval of papers with novel knowledge structures, pair-wise identification of scholarly relationships, and path-wise reconstruction of scientific evolutionary trajectories. Through extensive evaluation of three categories of retrieval agents, we find that their accuracy on relation-aware retrieval tasks often falls below 20%, revealing a core shortcoming of current retrieval paradigms. Notably, further experiments on the literature review tasks demonstrate that providing agents with relational ground truth leads to a substantial 23.4% performance improvement in the review quality, validating the critical importance of relation-aware retrieval. We publicly release our benchmark at https://anonymous.4open.science/r/SciNetBench/ to support future research on advanced retrieval systems.

Related papers

AgentIR: Reasoning-Aware Retrieval for Deep Research Agents [76.29382561831105]
Deep Research agents generate explicit natural language reasoning before each search call.<n> Reasoning-Aware Retrieval embeds the agent's reasoning trace alongside its query.<n>DR- Synth generates Deep Research retriever training data from standard QA datasets.<n>AgentIR-4B achieves 68% accuracy with the open-weight agent Tongyi-DeepResearch.
arXiv Detail & Related papers (2026-03-04T18:47:26Z)
FlyAOC: Evaluating Agentic Ontology Curation of Drosophila Scientific Knowledge Bases [10.00386797940562]
We present FlyBench to evaluate AI agents on end-to-end agentic curation from scientific literature.<n>Given only a gene symbol, agents must search and read from a corpus of 16,898 full-text papers to produce structured annotations.<n>The benchmark includes 7,397 expert-curated annotations across 100 genes drawn from FlyBase.
arXiv Detail & Related papers (2026-02-09T20:12:38Z)
Intelligent Scientific Literature Explorer using Machine Learning (ISLE) [0.797970449705065]
This paper presents an integrated system for scientific literature exploration that combines large-scale data acquisition, hybrid retrieval, semantic topic modeling, and heterogeneous knowledge graph construction.<n>The proposed framework contributes a foundation for AI-assisted scientific discovery.
arXiv Detail & Related papers (2025-12-14T16:54:24Z)
SciRAG: Adaptive, Citation-Aware, and Outline-Guided Retrieval and Synthesis for Scientific Literature [52.36039386997026]
We introduce SciRAG, an open-source framework for scientific literature exploration.<n>We introduce three key innovations: (1) adaptive retrieval that flexibly alternates between sequential and parallel evidence gathering; (2) citation-aware symbolic reasoning that leverages citation graphs to organize and filter documents; and (3) outline-guided synthesis that plans, critiques, and refines answers to ensure coherence and transparent attribution.
arXiv Detail & Related papers (2025-11-18T11:09:19Z)
ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers? [29.17900668495058]
We introduce ReplicationBench, an evaluation framework for frontier AI agents.<n>It tests whether agents can replicate entire research papers drawn from the astrophysics literature.<n>R ReplicationBench establishes the first benchmark of paper-scale, expert-validated astrophysics research tasks.
arXiv Detail & Related papers (2025-10-28T16:21:19Z)
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents [72.28593628378991]
WebResearcher is an iterative deep-research paradigm that reformulates deep research as a Markov Decision Process.<n>WebResearcher achieves state-of-the-art performance, even surpassing frontier proprietary systems.
arXiv Detail & Related papers (2025-09-16T17:57:17Z)
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z)
A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [51.26815896167173]
We present a comprehensive tertiary analysis of PAMI reviews along three complementary dimensions.<n>Our analyses reveal distinctive organizational patterns as well as persistent gaps in current review practices.<n>Finally, our evaluation of state-of-the-art AI-generated reviews indicates encouraging advances in coherence and organization.
arXiv Detail & Related papers (2024-02-20T11:28:50Z)
Evaluating BERT-based Scientific Relation Classifiers for Scholarly Knowledge Graph Construction on Digital Library Collections [5.8962650619804755]
Inferring semantic relations between related scientific concepts is a crucial step. BERT-based pre-trained models have been popularly explored for automatic relation classification. Existing methods are primarily evaluated on clean texts. To address these limitations, we started by creating OCR-noisy texts.
arXiv Detail & Related papers (2023-05-03T17:32:16Z)
Retrieval Augmentation for Commonsense Reasoning: A Unified Approach [64.63071051375289]
We propose a unified framework of retrieval-augmented commonsense reasoning (called RACo) Our proposed RACo can significantly outperform other knowledge-enhanced method counterparts.
arXiv Detail & Related papers (2022-10-23T23:49:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.