Keyword Embeddings for Query Suggestion
- URL: http://arxiv.org/abs/2301.08006v2
- Date: Mon, 23 Jan 2023 09:12:33 GMT
- Title: Keyword Embeddings for Query Suggestion
- Authors: Jorge Gab\'in, M. Eduardo Ares and Javier Parapar
- Abstract summary: This paper proposes two novel models for the keyword suggestion task trained on scientific literature.
Our techniques adapt the architecture of Word2Vec and FastText to generate keyword embeddings by leveraging documents' keyword co-occurrence.
We evaluate our proposals against the state-of-the-art word and sentence embedding models showing considerable improvements over the baselines for the tasks.
- Score: 3.7900158137749322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, search engine users commonly rely on query suggestions to improve
their initial inputs. Current systems are very good at recommending lexical
adaptations or spelling corrections to users' queries. However, they often
struggle to suggest semantically related keywords given a user's query. The
construction of a detailed query is crucial in some tasks, such as legal
retrieval or academic search. In these scenarios, keyword suggestion methods
are critical to guide the user during the query formulation. This paper
proposes two novel models for the keyword suggestion task trained on scientific
literature. Our techniques adapt the architecture of Word2Vec and FastText to
generate keyword embeddings by leveraging documents' keyword co-occurrence.
Along with these models, we also present a specially tailored negative sampling
approach that exploits how keywords appear in academic publications. We devise
a ranking-based evaluation methodology following both known-item and ad-hoc
search scenarios. Finally, we evaluate our proposals against the
state-of-the-art word and sentence embedding models showing considerable
improvements over the baselines for the tasks.
Related papers
- Taxonomy-guided Semantic Indexing for Academic Paper Search [51.07749719327668]
TaxoIndex is a semantic index framework for academic paper search.
It organizes key concepts from papers as a semantic index guided by an academic taxonomy.
It can be flexibly employed to enhance existing dense retrievers.
arXiv Detail & Related papers (2024-10-25T00:00:17Z) - Hybrid Semantic Search: Unveiling User Intent Beyond Keywords [0.0]
This paper addresses the limitations of traditional keyword-based search in understanding user intent.
It introduces a novel hybrid search approach that leverages the strengths of non-semantic search engines, Large Language Models (LLMs), and embedding models.
arXiv Detail & Related papers (2024-08-17T16:04:31Z) - Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval.
evaluation benchmark includes 3,452 high-quality exclusionary queries.
training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - Typo-Robust Representation Learning for Dense Retrieval [6.148710657178892]
One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words.
A popular approach for handling misspelled queries is minimizing the representations discrepancy between misspelled queries and their pristine ones.
Unlike the existing approaches, which only focus on the alignment between misspelled and pristine queries, our method also improves the contrast between each misspelled query and its surrounding queries.
arXiv Detail & Related papers (2023-06-17T13:48:30Z) - Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening [53.1711708318581]
Current image-text retrieval methods suffer from $N$-related time complexity.
This paper presents a simple and effective keyword-guided pre-screening framework for the image-text retrieval.
arXiv Detail & Related papers (2023-03-14T09:36:42Z) - End-to-End Open Vocabulary Keyword Search [13.90172596423425]
We propose a model directly optimized for keyword search.
The proposed model outperforms similar end-to-end models on a task where the ratio of positive and negative trials is artificially balanced.
Using our system to rescore the outputs an LVCSR-based keyword search system leads to significant improvements.
arXiv Detail & Related papers (2021-08-23T18:34:53Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Keyword-Attentive Deep Semantic Matching [1.8416014644193064]
We propose a keyword-attentive approach to improve deep semantic matching.
We first leverage domain tags from a large corpus to generate a domain-enhanced keyword dictionary.
During model training, we propose a new negative sampling approach based on keyword coverage between the input pair.
arXiv Detail & Related papers (2020-03-11T10:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.