Query Rewriting via Cycle-Consistent Translation for E-Commerce Search
- URL: http://arxiv.org/abs/2103.00800v1
- Date: Mon, 1 Mar 2021 06:47:12 GMT
- Title: Query Rewriting via Cycle-Consistent Translation for E-Commerce Search
- Authors: Yiming Qiu, Kang Zhang, Han Zhang, Songlin Wang, Sulong Xu, Yun Xiao,
Bo Long, Wen-Yun Yang
- Abstract summary: We propose a novel deep neural network based approach to query rewriting.
We formulate query rewriting into a cyclic machine translation problem.
We introduce a novel cyclic consistent training algorithm in conjunction with state-of-the-art machine translation models.
- Score: 13.723266150864037
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Nowadays e-commerce search has become an integral part of many people's
shopping routines. One critical challenge in today's e-commerce search is the
semantic matching problem where the relevant items may not contain the exact
terms in the user query. In this paper, we propose a novel deep neural network
based approach to query rewriting, in order to tackle this problem.
Specifically, we formulate query rewriting into a cyclic machine translation
problem to leverage abundant click log data. Then we introduce a novel cyclic
consistent training algorithm in conjunction with state-of-the-art machine
translation models to achieve the optimal performance in terms of query
rewriting accuracy. In order to make it practical in industrial scenarios, we
optimize the syntax tree construction to reduce computational cost and online
serving latency. Offline experiments show that the proposed method is able to
rewrite hard user queries into more standard queries that are more appropriate
for the inverted index to retrieve. Comparing with human curated rule-based
method, the proposed model significantly improves query rewriting diversity
while maintaining good relevancy. Online A/B experiments show that it improves
core e-commerce business metrics significantly. Since the summer of 2020, the
proposed model has been launched into our search engine production, serving
hundreds of millions of users.
Related papers
- Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search [32.35446999027349]
We leverage both rewritten queries and relevance judgments in the conversational search data to train a better query representation model.
The proposed model -- Query Representation Alignment Conversational Retriever, QRACDR, is tested on eight datasets.
arXiv Detail & Related papers (2024-07-29T17:14:36Z) - A Use Case: Reformulating Query Rewriting as a Statistical Machine
Translation Problem [0.0]
The paper proposes a query rewriting pipeline based on a monolingual machine translation model that learns to rewrite Arabic user search queries.
This paper also describes preprocessing steps to create a mapping between user queries and web page titles.
arXiv Detail & Related papers (2023-10-19T11:37:14Z) - Semantic Equivalence of e-Commerce Queries [6.232692545488813]
This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes.
The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for user or business objectives.
arXiv Detail & Related papers (2023-08-07T18:40:13Z) - Improving Text Matching in E-Commerce Search with A Rationalizable,
Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM)
The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy.
We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z) - How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales.
We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters.
While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z) - Context-Aware Query Rewriting for Improving Users' Search Experience on
E-commerce Websites [47.04727122209316]
E-commerce queries are often short and ambiguous.
Users tend to enter multiple searches, which we call context, before purchasing.
We propose an end-to-end context-aware query rewriting model.
arXiv Detail & Related papers (2022-09-15T19:46:01Z) - Online Learning of Optimally Diverse Rankings [63.62764375279861]
We propose an algorithm that efficiently learns the optimal list based on users' feedback only.
We show that after $T$ queries, the regret of LDR scales as $O((N-L)log(T))$ where $N$ is the number of all items.
arXiv Detail & Related papers (2021-09-13T12:13:20Z) - Session-Aware Query Auto-completion using Extreme Multi-label Ranking [61.753713147852125]
We take the novel approach of modeling session-aware query auto-completion as an e Multi-Xtreme Ranking (XMR) problem.
We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm.
Our approach meets the stringent latency requirements for auto-complete systems while leveraging session information in making suggestions.
arXiv Detail & Related papers (2020-12-09T17:56:22Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.