Related papers: Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy

Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy

URL: http://arxiv.org/abs/2403.04160v1
Date: Thu, 7 Mar 2024 02:34:54 GMT
Title: Improving Retrieval in Theme-specific Applications using a Corpus Topical Taxonomy
Authors: SeongKu Kang, Shivam Agarwal, Bowen Jin, Dongha Lee, Hwanjo Yu, and Jiawei Han
Abstract summary: We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework. ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
Score: 52.426623750562335
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Document retrieval has greatly benefited from the advancements of large-scale pre-trained language models (PLMs). However, their effectiveness is often limited in theme-specific applications for specialized areas or industries, due to unique terminologies, incomplete contexts of user queries, and specialized search intents. To capture the theme-specific information and improve retrieval, we propose to use a corpus topical taxonomy, which outlines the latent topic structure of the corpus while reflecting user-interested aspects. We introduce ToTER (Topical Taxonomy Enhanced Retrieval) framework, which identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts. As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers. Through extensive quantitative, ablative, and exploratory experiments on two real-world datasets, we ascertain the benefits of using topical taxonomy for retrieval in theme-specific applications and demonstrate the effectiveness of ToTER.

Related papers

Towards Context-aware Reasoning-enhanced Generative Searching in E-commerce [61.03081096959132]
We propose a context-aware reasoning-enhanced generative search framework for better textbfunderstanding the complicated context.<n>Our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.
arXiv Detail & Related papers (2025-10-19T16:46:11Z)
Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation [1.3750624267664158]
We introduce Recursive Thematic Partitioning (RTP) to interactively build a binary tree.<n>Each node in the tree is a natural language question that semantically partitions the data, resulting in a fully interpretable taxonomy.<n>We show that RTP's question-driven hierarchy is more interpretable than the keyword-based topics from a strong baseline like BERTopic.
arXiv Detail & Related papers (2025-09-26T11:27:22Z)
Taxonomy-guided Semantic Indexing for Academic Paper Search [51.07749719327668]
TaxoIndex is a semantic index framework for academic paper search. It organizes key concepts from papers as a semantic index guided by an academic taxonomy. It can be flexibly employed to enhance existing dense retrievers.
arXiv Detail & Related papers (2024-10-25T00:00:17Z)
Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG) We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z)
Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models [27.90653125902507]
We propose a knowledge-intensive approach that reframes query-focused summarization as a knowledge-intensive task setup. The retrieval module efficiently retrieves potentially relevant documents from a large-scale knowledge corpus. The summarization controller seamlessly integrates a powerful large language model (LLM)-based summarizer with a carefully tailored prompt.
arXiv Detail & Related papers (2024-08-19T18:54:20Z)
Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy [66.95501113584541]
Utility and topical relevance are critical measures in information retrieval. We propose an Iterative utiliTy judgmEnt fraMework to promote each step of the cycle of Retrieval-Augmented Generation.
arXiv Detail & Related papers (2024-06-17T07:52:42Z)
Augmented Embeddings for Custom Retrievals [13.773007276544913]
We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval. Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding.
arXiv Detail & Related papers (2023-10-09T03:29:35Z)
DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research [96.10765714077208]
Traditional keyword-based search engines fall short in assisting users who may not be familiar with specific terminologies. We present a knowledge graph-based paper search engine for biomedical research to enhance the user experience. The system, dubbed DiscoverPath, employs Named Entity Recognition (NER) and part-of-speech (POS) tagging to extract terminologies and relationships from article abstracts to create a KG.
arXiv Detail & Related papers (2023-09-04T20:52:33Z)
TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom. TaxoCom discovers novel sub-topic clusters of terms and documents. Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z)
Aspect-Oriented Summarization through Query-Focused Extraction [23.62412515574206]
Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries. We benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model. We evaluate on two aspect-oriented datasets and find this approach yields focused summaries, better than those from a generic summarization system.
arXiv Detail & Related papers (2021-10-15T18:06:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.