Related papers: Beyond Questions: Leveraging ColBERT for Keyphrase Search

Beyond Questions: Leveraging ColBERT for Keyphrase Search

URL: http://arxiv.org/abs/2412.03193v1
Date: Wed, 04 Dec 2024 10:27:35 GMT
Title: Beyond Questions: Leveraging ColBERT for Keyphrase Search
Authors: Jorge Gabín, Javier Parapar, Craig Macdonald,
Abstract summary: Keyphrase search has traditionally been the cornerstone of web search.<n>Current dense retrieval models often fail with keyphrase-like queries.<n>This paper introduces a novel model that employs the ColBERT architecture to enhance document ranking for keyphrase queries.
Score: 13.262269510021333
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While question-like queries are gaining popularity and search engines' users increasingly adopt them, keyphrase search has traditionally been the cornerstone of web search. This query type is also prevalent in specialised search tasks such as academic or professional search, where experts rely on keyphrases to articulate their information needs. However, current dense retrieval models often fail with keyphrase-like queries, primarily because they are mostly trained on question-like ones. This paper introduces a novel model that employs the ColBERT architecture to enhance document ranking for keyphrase queries. For that, given the lack of large keyphrase-based retrieval datasets, we first explore how Large Language Models can convert question-like queries into keyphrase format. Then, using those keyphrases, we train a keyphrase-based ColBERT ranker (ColBERTKP_QD) to improve the performance when working with keyphrase queries. Furthermore, to reduce the training costs associated with training the full ColBERT model, we investigate the feasibility of training only a keyphrase query encoder while keeping the document encoder weights static (ColBERTKP_Q). We assess our proposals' ranking performance using both automatically generated and manually annotated keyphrases. Our results reveal the potential of the late interaction architecture when working under the keyphrase search scenario.

Related papers

Improving Retrieval in Sponsored Search by Leveraging Query Context Signals [6.152499434499752]
We propose an approach to enhance query understanding by augmenting queries with rich contextual signals. We use web search titles and snippets to ground queries in real-world information and utilize GPT-4 to generate query rewrites and explanations. Our context-aware approach substantially outperforms context-free models.
arXiv Detail & Related papers (2024-07-19T14:28:53Z)
LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance. There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results. We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z)
Keyword Embeddings for Query Suggestion [3.7900158137749322]
This paper proposes two novel models for the keyword suggestion task trained on scientific literature. Our techniques adapt the architecture of Word2Vec and FastText to generate keyword embeddings by leveraging documents' keyword co-occurrence. We evaluate our proposals against the state-of-the-art word and sentence embedding models showing considerable improvements over the baselines for the tasks.
arXiv Detail & Related papers (2023-01-19T11:13:04Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training [66.64843711515341]
Keyphrase generation is the task of automatically predicting keyphrases given a piece of long text. We call attention to a new setting named multilingual keyphrase generation. We propose a retrieval-augmented method for multilingual keyphrase generation to mitigate the data shortage problem in non-English languages.
arXiv Detail & Related papers (2022-05-21T00:45:21Z)
Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval. We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases. We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z)
Quotient Space-Based Keyword Retrieval in Sponsored Search [7.639289301435027]
Synonymous keyword retrieval has become an important problem for sponsored search. We propose a novel quotient space-based retrieval framework to address this problem. This method has been successfully implemented in Baidu's online sponsored search system.
arXiv Detail & Related papers (2021-05-26T07:27:54Z)
Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document. Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks. In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z)
Exclusive Hierarchical Decoding for Deep Keyphrase Generation [63.357895318562214]
Keyphrase generation (KG) aims to summarize the main ideas of a document into a set of keyphrases. Previous work in this setting employs a sequential decoding process to generate keyphrases. We propose an exclusive hierarchical decoding framework that includes a hierarchical decoding process and either a soft or a hard exclusion mechanism.
arXiv Detail & Related papers (2020-04-18T02:58:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.