Quotient Space-Based Keyword Retrieval in Sponsored Search
- URL: http://arxiv.org/abs/2105.12371v1
- Date: Wed, 26 May 2021 07:27:54 GMT
- Title: Quotient Space-Based Keyword Retrieval in Sponsored Search
- Authors: Yijiang Lian, Shuang Li, Chaobing Feng, YanFeng Zhu
- Abstract summary: Synonymous keyword retrieval has become an important problem for sponsored search.
We propose a novel quotient space-based retrieval framework to address this problem.
This method has been successfully implemented in Baidu's online sponsored search system.
- Score: 7.639289301435027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synonymous keyword retrieval has become an important problem for sponsored
search ever since major search engines relax the exact match product's matching
requirement to a synonymous level. Since the synonymous relations between
queries and keywords are quite scarce, the traditional information retrieval
framework is inefficient in this scenario. In this paper, we propose a novel
quotient space-based retrieval framework to address this problem. Considering
the synonymy among keywords as a mathematical equivalence relation, we can
compress the synonymous keywords into one representative, and the corresponding
quotient space would greatly reduce the size of the keyword repository. Then an
embedding-based retrieval is directly conducted between queries and the keyword
representatives. To mitigate the semantic gap of the quotient space-based
retrieval, a single semantic siamese model is utilized to detect both the
keyword--keyword and query-keyword synonymous relations. The experiments show
that with our quotient space-based retrieval method, the synonymous keyword
retrieving performance can be greatly improved in terms of memory cost and
recall efficiency. This method has been successfully implemented in Baidu's
online sponsored search system and has yielded a significant improvement in
revenue.
Related papers
- Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Keyword Augmented Retrieval: Novel framework for Information Retrieval
integrated with speech interface [0.0]
Retrieving answers in a quick and low cost manner without hallucinations using Language models is a major hurdle.
This is what prevents employment of Language models in knowledge retrieval automation.
For commercial search and chat-bot applications, complete reliance on commercial large language models (LLMs) like GPT 3.5 etc. can be very costly.
arXiv Detail & Related papers (2023-10-06T12:44:04Z) - WISK: A Workload-aware Learned Index for Spatial Keyword Queries [46.96314606580924]
We propose WISK, a learned index for spatial keyword queries.
We show that WISK achieves up to 8x speedup in querying time with comparable storage overhead.
arXiv Detail & Related papers (2023-02-28T03:45:25Z) - Keyword Embeddings for Query Suggestion [3.7900158137749322]
This paper proposes two novel models for the keyword suggestion task trained on scientific literature.
Our techniques adapt the architecture of Word2Vec and FastText to generate keyword embeddings by leveraging documents' keyword co-occurrence.
We evaluate our proposals against the state-of-the-art word and sentence embedding models showing considerable improvements over the baselines for the tasks.
arXiv Detail & Related papers (2023-01-19T11:13:04Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - Phrase Retrieval Learns Passage Retrieval, Too [77.57208968326422]
We study whether phrase retrieval can serve as the basis for coarse-level retrieval including passages and documents.
We show that a dense phrase-retrieval system, without any retraining, already achieves better passage retrieval accuracy.
We also show that phrase filtering and vector quantization can reduce the size of our index by 4-10x.
arXiv Detail & Related papers (2021-09-16T17:42:45Z) - End-to-End Open Vocabulary Keyword Search [13.90172596423425]
We propose a model directly optimized for keyword search.
The proposed model outperforms similar end-to-end models on a task where the ratio of positive and negative trials is artificially balanced.
Using our system to rescore the outputs an LVCSR-based keyword search system leads to significant improvements.
arXiv Detail & Related papers (2021-08-23T18:34:53Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Leveraging Cognitive Search Patterns to Enhance Automated Natural
Language Retrieval Performance [0.0]
We show that cognitive reformulation patterns that mimic user search behaviour are highlighted.
We formalize the application of these patterns by considering a query conceptual representation.
A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type.
arXiv Detail & Related papers (2020-04-21T14:13:33Z) - Keyword-Attentive Deep Semantic Matching [1.8416014644193064]
We propose a keyword-attentive approach to improve deep semantic matching.
We first leverage domain tags from a large corpus to generate a domain-enhanced keyword dictionary.
During model training, we propose a new negative sampling approach based on keyword coverage between the input pair.
arXiv Detail & Related papers (2020-03-11T10:18:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.