PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing
- URL: http://arxiv.org/abs/2508.11116v1
- Date: Thu, 14 Aug 2025 23:43:46 GMT
- Title: PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing
- Authors: Zhuoqun Li, Xuanang Chen, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun,
- Abstract summary: As research deepens, paper search requirements may become more flexible.<n>PaperRegister transforms traditional abstract-based index into hierarchical index tree for paper search.<n> Experiments on paper search tasks across a range of granularity demonstrate that PaperRegister achieves the state-of-the-art performance.
- Score: 46.14402767316296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Paper search is an important activity for researchers, typically involving using a query with description of a topic to find relevant papers. As research deepens, paper search requirements may become more flexible, sometimes involving specific details such as module configuration rather than being limited to coarse-grained topics. However, previous paper search systems are unable to meet these flexible-grained requirements, as these systems mainly collect paper abstracts to construct index of corpus, which lack detailed information to support retrieval by finer-grained queries. In this work, we propose PaperRegister, consisted of offline hierarchical indexing and online adaptive retrieval, transforming traditional abstract-based index into hierarchical index tree for paper search, thereby supporting queries at flexible granularity. Experiments on paper search tasks across a range of granularity demonstrate that PaperRegister achieves the state-of-the-art performance, and particularly excels in fine-grained scenarios, highlighting the good potential as an effective solution for flexible-grained paper search in real-world applications. Code for this work is in https://github.com/Li-Z-Q/PaperRegister.
Related papers
- Revisiting Text Ranking in Deep Research [24.324221566628125]
Black-box web search APIs hinder systematic analysis of search components.<n>We reproduce a selection of key findings and best practices for IR text ranking methods in the deep research setting.
arXiv Detail & Related papers (2026-02-25T00:18:07Z) - BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents [11.158307125677375]
Retrieval-Augmented Generation (RAG) queries highly relevant information from external complex documents.<n>We introduce BookRAG, a novel RAG approach targeted for documents with a hierarchical structure.<n>BookRAG achieves state-of-the-art performance, significantly outperforming baselines in both retrieval recall and QA accuracy.
arXiv Detail & Related papers (2025-12-03T03:40:49Z) - PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization [61.783280234747394]
PRISM is a document-to-document retrieval method that introduces multiple, fine-grained representations for both the query and candidate papers.<n>We present SciFullBench, a novel benchmark in which the complete and segmented context of full papers for both queries and candidates is available.<n>Experiments show that PRISM improves performance by an average of 4.3% over existing retrieval baselines.
arXiv Detail & Related papers (2025-07-14T08:41:53Z) - ManuSearch: Democratizing Deep Search in Large Language Models with a Transparent and Open Multi-Agent Framework [73.91207117772291]
ManuSearch is a transparent and modular multi-agent framework designed to democratize deep search for large language models (LLMs)<n>ManuSearch decomposes the search and reasoning process into three collaborative agents: (1) a solution planning agent that iteratively formulates sub-queries, (2) an Internet search agent that retrieves relevant documents via real-time web search, and (3) a structured webpage reading agent that extracts key evidence from raw web content.
arXiv Detail & Related papers (2025-05-23T17:02:02Z) - ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels.<n>Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches.<n>Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z) - Ranking Narrative Query Graphs for Biomedical Document Retrieval (Technical Report) [7.527096697768715]
This paper extends our existing graph-based discovery system for the biomedical domain.<n>It contributes effective graph-based unsupervised ranking methods, a new query relaxation paradigm, and ontological rewriting.
arXiv Detail & Related papers (2024-12-06T12:49:28Z) - PseudoSeer: a Search Engine for Pseudocode [18.726136894285403]
A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode.
By leveraging snippets, the system enables users to search across various facets of a paper, such as the title, abstract, author information, and code snippets.
A weighted BM25-based ranking algorithm is used by the search engine, and factors considered when prioritizing search results are described.
arXiv Detail & Related papers (2024-11-19T16:58:03Z) - Taxonomy-guided Semantic Indexing for Academic Paper Search [51.07749719327668]
TaxoIndex is a semantic index framework for academic paper search.
It organizes key concepts from papers as a semantic index guided by an academic taxonomy.
It can be flexibly employed to enhance existing dense retrievers.
arXiv Detail & Related papers (2024-10-25T00:00:17Z) - Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Extracting Variable-Depth Logical Document Hierarchy from Long
Documents: Method, Evaluation, and Application [21.270184491603864]
We develop a framework, namely Hierarchy Extraction from Long Document (HELD), where we "sequentially" insert each physical object at the proper on of the current tree.
Experiments based on thousands of long documents from Chinese, English financial market and English scientific publication.
We show that logical document hierarchy can be employed to significantly improve the performance of the downstream passage retrieval task.
arXiv Detail & Related papers (2021-05-14T06:26:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.