End-to-End Open Vocabulary Keyword Search
- URL: http://arxiv.org/abs/2108.10357v1
- Date: Mon, 23 Aug 2021 18:34:53 GMT
- Title: End-to-End Open Vocabulary Keyword Search
- Authors: Bolaji Yusuf, Alican Gok, Batuhan Gundogdu, Murat Saraclar
- Abstract summary: We propose a model directly optimized for keyword search.
The proposed model outperforms similar end-to-end models on a task where the ratio of positive and negative trials is artificially balanced.
Using our system to rescore the outputs an LVCSR-based keyword search system leads to significant improvements.
- Score: 13.90172596423425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, neural approaches to spoken content retrieval have become popular.
However, they tend to be restricted in their vocabulary or in their ability to
deal with imbalanced test settings. These restrictions limit their
applicability in keyword search, where the set of queries is not known
beforehand, and where the system should return not just whether an utterance
contains a query but the exact location of any such occurrences. In this work,
we propose a model directly optimized for keyword search. The model takes a
query and an utterance as input and returns a sequence of probabilities for
each frame of the utterance of the query having occurred in that frame.
Experiments show that the proposed model not only outperforms similar
end-to-end models on a task where the ratio of positive and negative trials is
artificially balanced, but it is also able to deal with the far more
challenging task of keyword search with its inherent imbalance. Furthermore,
using our system to rescore the outputs an LVCSR-based keyword search system
leads to significant improvements on the latter.
Related papers
- Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
We train a dual-encoder model starting from a language model pretrained on a large text corpus.
Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z) - ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval.
evaluation benchmark includes 3,452 high-quality exclusionary queries.
training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z) - LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries [53.843367588870585]
List K-kNN spatial keyword queries (TkQs) return a list of objects based on a ranking function that considers both spatial and textual relevance.
There are two key challenges in building an effective and efficient index, i.e., the absence of high-quality labels and the unbalanced results.
We develop a novel pseudolabel generation technique to address the two challenges.
arXiv Detail & Related papers (2024-03-12T05:32:33Z) - End-to-End Open Vocabulary Keyword Search With Multilingual Neural
Representations [7.780766187171571]
We propose a neural ASR-free keyword search model which achieves competitive performance.
We extend this work with multilingual pretraining and detailed analysis of the model.
Our experiments show that the proposed multilingual training significantly improves the model performance.
arXiv Detail & Related papers (2023-08-15T20:33:25Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Regularized Contrastive Learning of Semantic Search [0.0]
Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations.
We propose a new regularization method: Regularized Contrastive Learning.
It augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators.
arXiv Detail & Related papers (2022-09-27T08:25:19Z) - Improving Contextual Recognition of Rare Words with an Alternate
Spelling Prediction Model [0.0]
We release contextual biasing lists to accompany the Earnings21 dataset.
We show results for shallow fusion contextual biasing applied to two different decoding algorithms.
We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative.
arXiv Detail & Related papers (2022-09-02T19:30:16Z) - Quotient Space-Based Keyword Retrieval in Sponsored Search [7.639289301435027]
Synonymous keyword retrieval has become an important problem for sponsored search.
We propose a novel quotient space-based retrieval framework to address this problem.
This method has been successfully implemented in Baidu's online sponsored search system.
arXiv Detail & Related papers (2021-05-26T07:27:54Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - Session-Aware Query Auto-completion using Extreme Multi-label Ranking [61.753713147852125]
We take the novel approach of modeling session-aware query auto-completion as an e Multi-Xtreme Ranking (XMR) problem.
We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm.
Our approach meets the stringent latency requirements for auto-complete systems while leveraging session information in making suggestions.
arXiv Detail & Related papers (2020-12-09T17:56:22Z) - Leveraging Cognitive Search Patterns to Enhance Automated Natural
Language Retrieval Performance [0.0]
We show that cognitive reformulation patterns that mimic user search behaviour are highlighted.
We formalize the application of these patterns by considering a query conceptual representation.
A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type.
arXiv Detail & Related papers (2020-04-21T14:13:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.