Hierarchical Query Classification in E-commerce Search
- URL: http://arxiv.org/abs/2403.06021v1
- Date: Sat, 9 Mar 2024 21:55:55 GMT
- Title: Hierarchical Query Classification in E-commerce Search
- Authors: Bing He, Sreyashi Nag, Limeng Cui, Suhang Wang, Zheng Li, Rahul
Goutam, Zhen Li, Haiyang Zhang
- Abstract summary: E-commerce platforms typically store and structure product information and search data in a hierarchy.
Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research.
The inherent complexity of hierarchical query classification is compounded by two primary challenges: (1) the pronounced class imbalance that skews towards dominant categories, and (2) the inherent brevity and ambiguity of search queries that hinder accurate classification.
- Score: 38.67034103433015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: E-commerce platforms typically store and structure product information and
search data in a hierarchy. Efficiently categorizing user search queries into a
similar hierarchical structure is paramount in enhancing user experience on
e-commerce platforms as well as news curation and academic research. The
significance of this task is amplified when dealing with sensitive query
categorization or critical information dissemination, where inaccuracies can
lead to considerable negative impacts. The inherent complexity of hierarchical
query classification is compounded by two primary challenges: (1) the
pronounced class imbalance that skews towards dominant categories, and (2) the
inherent brevity and ambiguity of search queries that hinder accurate
classification.
To address these challenges, we introduce a novel framework that leverages
hierarchical information through (i) enhanced representation learning that
utilizes the contrastive loss to discern fine-grained instance relationships
within the hierarchy, called ''instance hierarchy'', and (ii) a nuanced
hierarchical classification loss that attends to the intrinsic label taxonomy,
named ''label hierarchy''. Additionally, based on our observation that certain
unlabeled queries share typographical similarities with labeled queries, we
propose a neighborhood-aware sampling technique to intelligently select these
unlabeled queries to boost the classification performance. Extensive
experiments demonstrate that our proposed method is better than
state-of-the-art (SOTA) on the proprietary Amazon dataset, and comparable to
SOTA on the public datasets of Web of Science and RCV1-V2. These results
underscore the efficacy of our proposed solution, and pave the path toward the
next generation of hierarchy-aware query classification systems.
Related papers
- Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - Label Hierarchy Transition: Delving into Class Hierarchies to Enhance
Deep Classifiers [40.993137740456014]
We propose a unified probabilistic framework based on deep learning to address the challenges of hierarchical classification.
The proposed framework can be readily adapted to any existing deep network with only minor modifications.
We extend our proposed LHT framework to the skin lesion diagnosis task and validate its great potential in computer-aided diagnosis.
arXiv Detail & Related papers (2021-12-04T14:58:36Z) - Out-of-Category Document Identification Using Target-Category Names as
Weak Supervision [64.671654559798]
Out-of-category detection aims to distinguish documents according to their semantic relevance to the inlier (or target) categories.
We present an out-of-category detection framework, which effectively measures how confidently each document belongs to one of the target categories.
arXiv Detail & Related papers (2021-11-24T21:01:25Z) - QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query
Attribute Value Extraction [57.56700153507383]
This paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO.
For the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels.
For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products.
arXiv Detail & Related papers (2021-08-19T03:24:23Z) - Inducing a hierarchy for multi-class classification problems [11.58041597483471]
In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not.
In this paper, we investigate a class of methods that induce a hierarchy that can similarly improve classification performance over flat classifiers.
We demonstrate the effectiveness of the class of methods both for discovering a latent hierarchy and for improving accuracy in principled simulation settings and three real data applications.
arXiv Detail & Related papers (2021-02-20T05:40:42Z) - Pitfalls of Assessing Extracted Hierarchies for Multi-Class
Classification [4.89253144446913]
We identify some common pitfalls that may lead practitioners to make misleading conclusions about their methods.
We show how the hierarchy's quality can become irrelevant depending on the experimental setup.
Our results confirm that datasets with a high number of classes generally present complex structures in how these classes relate to each other.
arXiv Detail & Related papers (2021-01-26T21:50:57Z) - Joint Learning of Hyperbolic Label Embeddings for Hierarchical
Multi-label Classification [9.996804039553858]
We consider the problem of multi-label classification where the labels lie in a hierarchy.
We propose a novel formulation for the joint learning and empirically evaluate its efficacy.
arXiv Detail & Related papers (2021-01-13T10:58:54Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Generating Categories for Sets of Entities [34.32017697099142]
Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities.
This paper presents a method of generating categories for sets of entities using neural abstractive summarization models.
We develop a test collection based on Wikipedia categories and demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-08-19T13:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.