Searching, fast and slow, through product catalogs
- URL: http://arxiv.org/abs/2401.00737v1
- Date: Mon, 1 Jan 2024 12:30:46 GMT
- Title: Searching, fast and slow, through product catalogs
- Authors: Dayananda Ubrangala, Juhi Sharma, Sharath Kumar Rangappa, Kiran R,
Ravi Prasad Kondapalli, Laurent Bou\'e
- Abstract summary: We present a unified architecture for SKU search that provides both a real-time suggestion system and a lower latency search system.
We show how our system vastly outperforms, in all aspects, the results provided by the default search engine.
- Score: 5.077235981745305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: String matching algorithms in the presence of abbreviations, such as in Stock
Keeping Unit (SKU) product catalogs, remains a relatively unexplored topic. In
this paper, we present a unified architecture for SKU search that provides both
a real-time suggestion system (based on a Trie data structure) as well as a
lower latency search system (making use of character level TF-IDF in
combination with language model vector embeddings) where users initiate the
search process explicitly. We carry out ablation studies that justify designing
a complex search system composed of multiple components to address the delicate
trade-off between speed and accuracy. Using SKU search in the Dynamics CRM as
an example, we show how our system vastly outperforms, in all aspects, the
results provided by the default search engine. Finally, we show how SKU
descriptions may be enhanced via generative text models (using gpt-3.5-turbo)
so that the consumers of the search results may get more context and a
generally better experience when presented with the results of their SKU
search.
Related papers
- Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express [3.8973445113342433]
Building a scalable multi-modal search system requires fine-tuning several components.
We address considerations such as embedding model selection, the roles of embeddings in matching and ranking, and the balance between dense and sparse embeddings.
arXiv Detail & Related papers (2024-08-26T23:52:27Z) - Generative Retrieval with Preference Optimization for E-commerce Search [16.78829577915103]
We develop an innovative framework for E-commerce search, called generative retrieval with preference optimization.
We employ multi-span identifiers to represent raw item titles and transform the task of generating titles from queries into the task of generating multi-span identifiers from queries.
Our experiments show that this framework achieves competitive performance on a real-world dataset, and online A/B tests demonstrate the superiority and effectiveness in improving conversion gains.
arXiv Detail & Related papers (2024-07-29T09:31:19Z) - STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases.
Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine.
We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z) - Large Search Model: Redefining Search Stack in the Era of LLMs [63.503320030117145]
We introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM)
All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts.
This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack.
arXiv Detail & Related papers (2023-10-23T05:52:09Z) - End-to-End Open Vocabulary Keyword Search With Multilingual Neural
Representations [7.780766187171571]
We propose a neural ASR-free keyword search model which achieves competitive performance.
We extend this work with multilingual pretraining and detailed analysis of the model.
Our experiments show that the proposed multilingual training significantly improves the model performance.
arXiv Detail & Related papers (2023-08-15T20:33:25Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Query Understanding for Natural Language Enterprise Search [0.7363840001905632]
Natural Language Search (NLS) extends the capabilities of search engines that perform keyword search allowing users to issue queries in a more "natural" language.
We present an NLS system we implemented as part of the Search service of a major CRM platform.
arXiv Detail & Related papers (2020-12-11T10:57:25Z) - AutoSTR: Efficient Backbone Search for Scene Text Recognition [80.7290173000068]
Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes.
We propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance.
Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks.
arXiv Detail & Related papers (2020-03-14T06:51:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.