A Semi-supervised Scalable Unified Framework for E-commerce Query Classification
- URL: http://arxiv.org/abs/2506.21049v1
- Date: Thu, 26 Jun 2025 06:52:33 GMT
- Title: A Semi-supervised Scalable Unified Framework for E-commerce Query Classification
- Authors: Chunyuan Yuan, Chong Zhang, Zheng Fang, Ming Pang, Xue Jiang, Changping Peng, Zhangang Lin, Ching Law,
- Abstract summary: E-commerce queries are usually short and lack context, and the information between labels cannot be used.<n>Most existing industrial query classification methods rely on users' posterior click behavior to construct training samples, resulting in a Matthew vicious cycle.<n>We propose a novel Semi-supervised Scalable Unified Framework (SSUF), containing multiple enhanced modules to unify the query classification tasks.
- Score: 13.695419069287482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Query classification, including multiple subtasks such as intent and category prediction, is vital to e-commerce applications. E-commerce queries are usually short and lack context, and the information between labels cannot be used, resulting in insufficient prior information for modeling. Most existing industrial query classification methods rely on users' posterior click behavior to construct training samples, resulting in a Matthew vicious cycle. Furthermore, the subtasks of query classification lack a unified framework, leading to low efficiency for algorithm optimization. In this paper, we propose a novel Semi-supervised Scalable Unified Framework (SSUF), containing multiple enhanced modules to unify the query classification tasks. The knowledge-enhanced module uses world knowledge to enhance query representations and solve the problem of insufficient query information. The label-enhanced module uses label semantics and semi-supervised signals to reduce the dependence on posterior labels. The structure-enhanced module enhances the label representation based on the complex label relations. Each module is highly pluggable, and input features can be added or removed as needed according to each subtask. We conduct extensive offline and online A/B experiments, and the results show that SSUF significantly outperforms the state-of-the-art models.
Related papers
- Generative Retrieval for Book search [106.67655212825025]
We propose an effective Generative retrieval framework for Book Search.<n>It features two main components: data augmentation and outline-oriented book encoding.<n>Experiments on a proprietary Baidu dataset demonstrate that GBS outperforms strong baselines.
arXiv Detail & Related papers (2025-01-19T12:57:13Z) - A Semi-supervised Multi-channel Graph Convolutional Network for Query Classification in E-commerce [10.870790183380517]
We propose a novel Semi-supervised Multi-channel Graph Convolutional Network (SMGCN) to address the above problems.
SMGCN extends category information and enhances the posterior label by utilizing the similarity score between the query and categories.
arXiv Detail & Related papers (2024-08-04T04:52:21Z) - Hierarchical Query Classification in E-commerce Search [38.67034103433015]
E-commerce platforms typically store and structure product information and search data in a hierarchy.
Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research.
The inherent complexity of hierarchical query classification is compounded by two primary challenges: (1) the pronounced class imbalance that skews towards dominant categories, and (2) the inherent brevity and ambiguity of search queries that hinder accurate classification.
arXiv Detail & Related papers (2024-03-09T21:55:55Z) - A General Model for Aggregating Annotations Across Simple, Complex, and
Multi-Object Annotation Tasks [51.14185612418977]
A strategy to improve label quality is to ask multiple annotators to label the same item and aggregate their labels.
While a variety of bespoke models have been proposed for specific tasks, our work is the first to introduce aggregation methods that generalize across many diverse complex tasks.
This article extends our prior work with investigation of three new research questions.
arXiv Detail & Related papers (2023-12-20T21:28:35Z) - Prompt Tuned Embedding Classification for Multi-Label Industry Sector Allocation [2.024620791810963]
This study benchmarks the performance of Prompt Tuning and baselines for multi-label text classification.
It is applied to classifying companies into an investment firm's proprietary industry taxonomy.
We confirm that the model's performance is consistent across both well-known and less-known companies.
arXiv Detail & Related papers (2023-09-21T13:45:32Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - A Multi-Granularity Matching Attention Network for Query Intent
Classification in E-commerce Retrieval [9.034096715927731]
This paper proposes a Multi-granularity Matching Attention Network (MMAN) for query intent classification.
MMAN contains three modules: a self-matching module, a char-level matching module, and a semantic-level matching module.
We conduct extensive offline and online A/B experiments, and the results show that the MMAN significantly outperforms the strong baselines.
arXiv Detail & Related papers (2023-03-28T10:25:17Z) - Learning Label Modular Prompts for Text Classification in the Wild [56.66187728534808]
We propose text classification in-the-wild, which introduces different non-stationary training/testing stages.
Decomposing a complex task into modular components can enable robust generalisation under such non-stationary environment.
We propose MODULARPROMPT, a label-modular prompt tuning framework for text classification tasks.
arXiv Detail & Related papers (2022-11-30T16:26:38Z) - Novel Class Discovery in Semantic Segmentation [104.30729847367104]
We introduce a new setting of Novel Class Discovery in Semantic (NCDSS)
It aims at segmenting unlabeled images containing new classes given prior knowledge from a labeled set of disjoint classes.
In NCDSS, we need to distinguish the objects and background, and to handle the existence of multiple classes within an image.
We propose the Entropy-based Uncertainty Modeling and Self-training (EUMS) framework to overcome noisy pseudo-labels.
arXiv Detail & Related papers (2021-12-03T13:31:59Z) - APRF-Net: Attentive Pseudo-Relevance Feedback Network for Query
Categorization [12.634704014206294]
We propose a novel deep neural model named textbfAttentive textbfPseudo textbfRelevance textbfFeedback textbfNetwork (APRF-Net) to enhance the representation of rare queries for query categorization.
Our results show that the APRF-Net significantly improves query categorization by 5.9% on $F1@1$ score over the baselines, which increases to 8.2% improvement for the rare queries.
arXiv Detail & Related papers (2021-04-23T02:34:08Z) - Active Learning++: Incorporating Annotator's Rationale using Local Model
Explanation [84.10721065676913]
Annotators can provide their rationale for choosing a label by ranking input features based on their importance for a given query.
Instead of weighing all committee models equally to select the next instance, we assign higher weight to the committee model with higher agreement with the annotator's ranking.
This approach is applicable to any kind of ML model using model-agnostic techniques to generate local explanation such as LIME.
arXiv Detail & Related papers (2020-09-06T08:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.