Hierarchical Query Classification in E-commerce Search
- URL: http://arxiv.org/abs/2403.06021v1
- Date: Sat, 9 Mar 2024 21:55:55 GMT
- Title: Hierarchical Query Classification in E-commerce Search
- Authors: Bing He, Sreyashi Nag, Limeng Cui, Suhang Wang, Zheng Li, Rahul
Goutam, Zhen Li, Haiyang Zhang
- Abstract summary: E-commerce platforms typically store and structure product information and search data in a hierarchy.
Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research.
The inherent complexity of hierarchical query classification is compounded by two primary challenges: (1) the pronounced class imbalance that skews towards dominant categories, and (2) the inherent brevity and ambiguity of search queries that hinder accurate classification.
- Score: 38.67034103433015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: E-commerce platforms typically store and structure product information and
search data in a hierarchy. Efficiently categorizing user search queries into a
similar hierarchical structure is paramount in enhancing user experience on
e-commerce platforms as well as news curation and academic research. The
significance of this task is amplified when dealing with sensitive query
categorization or critical information dissemination, where inaccuracies can
lead to considerable negative impacts. The inherent complexity of hierarchical
query classification is compounded by two primary challenges: (1) the
pronounced class imbalance that skews towards dominant categories, and (2) the
inherent brevity and ambiguity of search queries that hinder accurate
classification.
To address these challenges, we introduce a novel framework that leverages
hierarchical information through (i) enhanced representation learning that
utilizes the contrastive loss to discern fine-grained instance relationships
within the hierarchy, called ''instance hierarchy'', and (ii) a nuanced
hierarchical classification loss that attends to the intrinsic label taxonomy,
named ''label hierarchy''. Additionally, based on our observation that certain
unlabeled queries share typographical similarities with labeled queries, we
propose a neighborhood-aware sampling technique to intelligently select these
unlabeled queries to boost the classification performance. Extensive
experiments demonstrate that our proposed method is better than
state-of-the-art (SOTA) on the proprietary Amazon dataset, and comparable to
SOTA on the public datasets of Web of Science and RCV1-V2. These results
underscore the efficacy of our proposed solution, and pave the path toward the
next generation of hierarchy-aware query classification systems.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.