Taxonomy-guided Semantic Indexing for Academic Paper Search
- URL: http://arxiv.org/abs/2410.19218v1
- Date: Fri, 25 Oct 2024 00:00:17 GMT
- Title: Taxonomy-guided Semantic Indexing for Academic Paper Search
- Authors: SeongKu Kang, Yunyi Zhang, Pengcheng Jiang, Dongha Lee, Jiawei Han, Hwanjo Yu,
- Abstract summary: TaxoIndex is a semantic index framework for academic paper search.
It organizes key concepts from papers as a semantic index guided by an academic taxonomy.
It can be flexibly employed to enhance existing dense retrievers.
- Score: 51.07749719327668
- License:
- Abstract: Academic paper search is an essential task for efficient literature discovery and scientific advancement. While dense retrieval has advanced various ad-hoc searches, it often struggles to match the underlying academic concepts between queries and documents, which is critical for paper search. To enable effective academic concept matching for paper search, we propose Taxonomy-guided Semantic Indexing (TaxoIndex) framework. TaxoIndex extracts key concepts from papers and organizes them as a semantic index guided by an academic taxonomy, and then leverages this index as foundational knowledge to identify academic concepts and link queries and documents. As a plug-and-play framework, TaxoIndex can be flexibly employed to enhance existing dense retrievers. Extensive experiments show that TaxoIndex brings significant improvements, even with highly limited training data, and greatly enhances interpretability.
Related papers
- Conversational Exploratory Search of Scholarly Publications Using Knowledge Graphs [3.3916160303055567]
We develop a conversational search system for exploring scholarly publications using a knowledge graph.
To assess the system's effectiveness, we employed various performance metrics and conducted a human evaluation with 40 participants.
arXiv Detail & Related papers (2024-10-01T06:16:07Z) - VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and
Optimized Search [1.0411820336052784]
We propose VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval.
By utilizing innovative multi-vector search operations and encoding searches with advanced language models, our approach significantly improves retrieval accuracy.
Experiments on real-world datasets show that VectorSearch outperforms baseline metrics.
arXiv Detail & Related papers (2024-09-25T21:58:08Z) - Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature [48.572336666741194]
We present Knowledge Navigator, a system designed to enhance exploratory search abilities.
It organizes retrieved documents into a navigable, two-level hierarchy of named and descriptive scientific topics and subtopics.
arXiv Detail & Related papers (2024-08-28T14:48:37Z) - Khmer Semantic Search Engine (KSE): Digital Information Access and Document Retrieval [0.0]
Despite the daily generation of significant Khmer content, Cambodians struggle to find necessary documents.
Even Google does not deliver high accuracy for Khmer content.
This research proposes the first Khmer Semantic Search Engine (KSE), designed to enhance traditional Khmer search methods.
arXiv Detail & Related papers (2024-06-13T16:58:02Z) - Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Effective Distributed Representations for Academic Expert Search [1.9815631757151737]
We study how different distributed representations of academic papers (i.e. embeddings) impact academic expert retrieval.
In particular, we explore the impact of the use of contextualized embeddings on search performance.
We observe that using contextual embeddings produced by a transformer model trained for sentence similarity tasks produces the most effective paper representations.
arXiv Detail & Related papers (2020-10-16T09:43:18Z) - Octet: Online Catalog Taxonomy Enrichment with Self-Supervision [67.26804972901952]
We present a self-supervised end-to-end framework, Octet for Online Catalog EnrichmenT.
We propose to train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure.
Octet enriches an online catalog in production to 2 times larger in the open-world evaluation.
arXiv Detail & Related papers (2020-06-18T04:53:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.