QueryNER: Segmentation of E-commerce Queries
- URL: http://arxiv.org/abs/2405.09507v1
- Date: Wed, 15 May 2024 16:58:35 GMT
- Title: QueryNER: Segmentation of E-commerce Queries
- Authors: Chester Palen-Michel, Lizzie Liang, Zhe Wu, Constantine Lignos,
- Abstract summary: We present a manually-annotated dataset and accompanying model for e-commerce query segmentation.
Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types.
- Score: 12.563241705572409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation. Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types. We report baseline tagging results and conduct experiments comparing token and entity dropping for null and low recall query recovery. Challenging test sets are created using automatic transformations and show how simple data augmentation techniques can make the models more robust to noise. We make the QueryNER dataset publicly available.
Related papers
- Generative Retrieval with Preference Optimization for E-commerce Search [16.78829577915103]
We develop an innovative framework for E-commerce search, called generative retrieval with preference optimization.
We employ multi-span identifiers to represent raw item titles and transform the task of generating titles from queries into the task of generating multi-span identifiers from queries.
Our experiments show that this framework achieves competitive performance on a real-world dataset, and online A/B tests demonstrate the superiority and effectiveness in improving conversion gains.
arXiv Detail & Related papers (2024-07-29T09:31:19Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - Text-Based Product Matching -- Semi-Supervised Clustering Approach [9.748519919202986]
This paper aims to present a new philosophy to product matching utilizing a semi-supervised clustering approach.
We study the properties of this method by experimenting with the IDEC algorithm on the real-world dataset.
arXiv Detail & Related papers (2024-02-01T18:52:26Z) - Enhanced E-Commerce Attribute Extraction: Innovating with Decorative
Relation Correction and LLAMA 2.0-Based Annotation [4.81846973621209]
We propose a pioneering framework that integrates BERT for classification, a Conditional Random Fields (CRFs) layer for attribute value extraction, and Large Language Models (LLMs) for data annotation.
Our approach capitalizes on the robust representation learning of BERT, synergized with the sequence decoding prowess of CRFs, to adeptly identify and extract attribute values.
Our methodology is rigorously validated on various datasets, including Walmart, BestBuy's e-commerce NER dataset, and the CoNLL dataset.
arXiv Detail & Related papers (2023-12-09T08:26:30Z) - Improving Text Matching in E-Commerce Search with A Rationalizable,
Intervenable and Fast Entity-Based Relevance Model [78.80174696043021]
We propose a novel model called the Entity-Based Relevance Model (EBRM)
The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy.
We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance.
arXiv Detail & Related papers (2023-07-01T15:44:53Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query
Attribute Value Extraction [57.56700153507383]
This paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO.
For the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels.
For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products.
arXiv Detail & Related papers (2021-08-19T03:24:23Z) - APRF-Net: Attentive Pseudo-Relevance Feedback Network for Query
Categorization [12.634704014206294]
We propose a novel deep neural model named textbfAttentive textbfPseudo textbfRelevance textbfFeedback textbfNetwork (APRF-Net) to enhance the representation of rare queries for query categorization.
Our results show that the APRF-Net significantly improves query categorization by 5.9% on $F1@1$ score over the baselines, which increases to 8.2% improvement for the rare queries.
arXiv Detail & Related papers (2021-04-23T02:34:08Z) - Distant Supervision for E-commerce Query Segmentation via Attention
Network [22.946144345151605]
We propose a BiLSTM-CRF based model with an attention module to encode external features, such that external contexts information, which can be utilized naturally and effectively to help query segmentation.
Experiments on two datasets show the effectiveness of our approach compared with several kinds of baselines.
arXiv Detail & Related papers (2020-11-09T03:00:52Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.