Scientific Paper Retrieval with LLM-Guided Semantic-Based Ranking
- URL: http://arxiv.org/abs/2505.21815v1
- Date: Tue, 27 May 2025 22:49:18 GMT
- Title: Scientific Paper Retrieval with LLM-Guided Semantic-Based Ranking
- Authors: Yunyi Zhang, Ruozhen Yang, Siqi Jiao, SeongKu Kang, Jiawei Han,
- Abstract summary: SemRank is an effective and efficient paper retrieval framework.<n>It combines query understanding with a concept-based semantic index.<n> Experiments show that SemRank consistently improves the performance of various base retrievers.
- Score: 32.40639079110799
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific paper retrieval is essential for supporting literature discovery and research. While dense retrieval methods demonstrate effectiveness in general-purpose tasks, they often fail to capture fine-grained scientific concepts that are essential for accurate understanding of scientific queries. Recent studies also use large language models (LLMs) for query understanding; however, these methods often lack grounding in corpus-specific knowledge and may generate unreliable or unfaithful content. To overcome these limitations, we propose SemRank, an effective and efficient paper retrieval framework that combines LLM-guided query understanding with a concept-based semantic index. Each paper is indexed using multi-granular scientific concepts, including general research topics and detailed key phrases. At query time, an LLM identifies core concepts derived from the corpus to explicitly capture the query's information need. These identified concepts enable precise semantic matching, significantly enhancing retrieval accuracy. Experiments show that SemRank consistently improves the performance of various base retrievers, surpasses strong existing LLM-based baselines, and remains highly efficient.
Related papers
- Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim $\
ightarrow$ Evidence Reasoning [6.043212666944194]
We present CLAIM-BENCH, a benchmark for evaluating large language models' capabilities in scientific claim-evidence extraction and validation.<n>We show that closed-source models like GPT-4 and Claude consistently outperform open-source counterparts in precision and recall.<n> strategically designed three-pass and one-by-one prompting approaches significantly improve LLMs' abilities to accurately link dispersed evidence with claims.
arXiv Detail & Related papers (2025-06-09T21:04:39Z) - Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z) - Science Hierarchography: Hierarchical Organization of Science Literature [20.182213614072836]
We motivate SCIENCE HARCHOGRAPHY, the goal of organizing scientific literature into a high-quality hierarchical structure.<n>We develop a range of algorithms to achieve the goals of SCIENCE HIERARCHOGRAPHY.<n>Results show that this structured approach enhances interpretability, supports trend discovery, and offers an alternative pathway for exploring scientific literature beyond traditional search methods.
arXiv Detail & Related papers (2025-04-18T17:59:29Z) - Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers.<n>We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers.<n>Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z) - Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals [0.0]
This study investigates the application of autoregressive Large Language Models (LLMs) as evaluation agents to identify relevant scholarly contributions to United Nations' (UN) targets in scholarly publications.
We demonstrate that small, locally-hosted LLMs can differentiate semantically relevant contributions to SDG targets from documents retrieved due to incidental keyword matches, addressing the limitations of traditional methods.
arXiv Detail & Related papers (2024-11-26T17:06:30Z) - SciPIP: An LLM-based Scientific Paper Idea Proposer [30.670219064905677]
We introduce SciPIP, an innovative framework designed to enhance the proposal of scientific ideas through improvements in both literature retrieval and idea generation.<n>Our experiments, conducted across various domains such as natural language processing and computer vision, demonstrate SciPIP's capability to generate a multitude of innovative and useful ideas.
arXiv Detail & Related papers (2024-10-30T16:18:22Z) - Taxonomy-guided Semantic Indexing for Academic Paper Search [51.07749719327668]
TaxoIndex is a semantic index framework for academic paper search.
It organizes key concepts from papers as a semantic index guided by an academic taxonomy.
It can be flexibly employed to enhance existing dense retrievers.
arXiv Detail & Related papers (2024-10-25T00:00:17Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark [42.131133762827375]
This paper presents conceptual and experimental analyses of scientific summarization.<n>We introduce the Facet-aware Metric (FM), employing LLMs for advanced semantic matching to evaluate summaries.<n>Our findings confirm that FM offers a more logical approach to evaluating scientific summaries.
arXiv Detail & Related papers (2024-02-22T07:58:29Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - Retrieval Augmentation for Commonsense Reasoning: A Unified Approach [64.63071051375289]
We propose a unified framework of retrieval-augmented commonsense reasoning (called RACo)
Our proposed RACo can significantly outperform other knowledge-enhanced method counterparts.
arXiv Detail & Related papers (2022-10-23T23:49:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.