Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy
- URL: http://arxiv.org/abs/2403.04160v1
- Date: Thu, 7 Mar 2024 02:34:54 GMT
- Title: Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy
- Authors: SeongKu Kang, Shivam Agarwal, Bowen Jin, Dongha Lee, Hwanjo Yu, and
Jiawei Han
- Abstract summary: We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
- Score: 52.426623750562335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Document retrieval has greatly benefited from the advancements of large-scale
pre-trained language models (PLMs). However, their effectiveness is often
limited in theme-specific applications for specialized areas or industries, due
to unique terminologies, incomplete contexts of user queries, and specialized
search intents. To capture the theme-specific information and improve
retrieval, we propose to use a corpus topical taxonomy, which outlines the
latent topic structure of the corpus while reflecting user-interested aspects.
We introduce ToTER (Topical Taxonomy Enhanced Retrieval) framework, which
identifies the central topics of queries and documents with the guidance of the
taxonomy, and exploits their topical relatedness to supplement missing
contexts. As a plug-and-play framework, ToTER can be flexibly employed to
enhance various PLM-based retrievers. Through extensive quantitative, ablative,
and exploratory experiments on two real-world datasets, we ascertain the
benefits of using topical taxonomy for retrieval in theme-specific applications
and demonstrate the effectiveness of ToTER.
Related papers
- Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy [66.95501113584541]
Utility and topical relevance are critical measures in information retrieval.
We propose an Iterative utiliTy judgmEnt fraMework to promote each step of the cycle of Retrieval-Augmented Generation.
arXiv Detail & Related papers (2024-06-17T07:52:42Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [59.359325855708974]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval. Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid.
Our results reveal that proposition-based retrieval significantly outperforms traditional passage or sentence-based methods in dense retrieval.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Augmented Embeddings for Custom Retrievals [13.773007276544913]
We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval.
Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding.
arXiv Detail & Related papers (2023-10-09T03:29:35Z) - DiscoverPath: A Knowledge Refinement and Retrieval System for
Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies.
We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience.
The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z) - Query-Specific Knowledge Graphs for Complex Finance Topics [6.599344783327053]
We focus on the CODEC dataset, where domain experts create challenging questions.
We show that state-of-the-art ranking systems have headroom for improvement.
We demonstrate that entity and document relevance are positively correlated.
arXiv Detail & Related papers (2022-11-08T10:21:13Z) - Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation [58.3921103230647]
We propose a novel framework for topic taxonomy expansion, named TopicExpan.
TopicExpan directly generates topic-related terms belonging to new topics.
Experimental results on two real-world text corpora show that TopicExpan significantly outperforms other baseline methods in terms of the quality of output.
arXiv Detail & Related papers (2022-10-18T22:38:49Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Aspect-Oriented Summarization through Query-Focused Extraction [23.62412515574206]
Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries.
We benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model.
We evaluate on two aspect-oriented datasets and find this approach yields focused summaries, better than those from a generic summarization system.
arXiv Detail & Related papers (2021-10-15T18:06:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.