Related papers: A BERT-based Hierarchical Classification Model with Applications in Chinese Commodity Classification

A BERT-based Hierarchical Classification Model with Applications in Chinese Commodity Classification

URL: http://arxiv.org/abs/2508.15800v1
Date: Wed, 13 Aug 2025 16:10:47 GMT
Title: A BERT-based Hierarchical Classification Model with Applications in Chinese Commodity Classification
Authors: Kun Liu, Tuozhen Liu, Feifei Wang, Rui Pan,
Abstract summary: We introduce a large-scale hierarchical dataset collected from the JD e-commerce platform (www.JD.com)<n>We also propose a novel hierarchical text classification approach based on the widely used Bidirectional Representations from Transformers (BERT)<n>Our HFT-BERT model demonstrates exceptional performance in categorizing longer short texts, such as books.
Score: 12.186379198760733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing e-commerce platforms heavily rely on manual annotation for product categorization, which is inefficient and inconsistent. These platforms often employ a hierarchical structure for categorizing products; however, few studies have leveraged this hierarchical information for classification. Furthermore, studies that consider hierarchical information fail to account for similarities and differences across various hierarchical categories. Herein, we introduce a large-scale hierarchical dataset collected from the JD e-commerce platform (www.JD.com), comprising 1,011,450 products with titles and a three-level category structure. By making this dataset openly accessible, we provide a valuable resource for researchers and practitioners to advance research and applications associated with product categorization. Moreover, we propose a novel hierarchical text classification approach based on the widely used Bidirectional Encoder Representations from Transformers (BERT), called Hierarchical Fine-tuning BERT (HFT-BERT). HFT-BERT leverages the remarkable text feature extraction capabilities of BERT, achieving prediction performance comparable to those of existing methods on short texts. Notably, our HFT-BERT model demonstrates exceptional performance in categorizing longer short texts, such as books.

Related papers

Hierarchical Multi-Label Generation with Probabilistic Level-Constraint [3.1427813443719868]
Hierarchical Extreme Multi-Label Classification poses greater difficulties compared to traditional multi-label classification.<n>We employ a generative framework with Probabilistic Level Constraints (PLC) to generate hierarchical labels within a specific taxonomy.<n>Our approach achieves a new SOTA performance in the HMG task, but also has a much better performance in constrained the output of model than previous research work.
arXiv Detail & Related papers (2025-04-30T07:56:53Z)
Introducing Three New Benchmark Datasets for Hierarchical Text Classification [0.0]
We introduce three new HTC benchmark datasets in the domain of research publications.<n>We propose an approach which combines their classifications to improve the reliability and robustness of the dataset.<n>We evaluate the three created datasets with a clustering-based analysis and show that our proposed approach results in a higher quality dataset.
arXiv Detail & Related papers (2024-11-28T13:06:48Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
HDT: Hierarchical Document Transformer [70.2271469410557]
HDT exploits document structure by introducing auxiliary anchor tokens and redesigning the attention mechanism into a sparse multi-level hierarchy. We develop a novel sparse attention kernel that considers the hierarchical structure of documents.
arXiv Detail & Related papers (2024-07-11T09:28:04Z)
Hierarchical Query Classification in E-commerce Search [38.67034103433015]
E-commerce platforms typically store and structure product information and search data in a hierarchy. Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research. The inherent complexity of hierarchical query classification is compounded by two primary challenges: (1) the pronounced class imbalance that skews towards dominant categories, and (2) the inherent brevity and ambiguity of search queries that hinder accurate classification.
arXiv Detail & Related papers (2024-03-09T21:55:55Z)
Class-incremental Novel Class Discovery [76.35226130521758]
We study the new task of class-incremental Novel Class Discovery (class-iNCD) We propose a novel approach for class-iNCD which prevents forgetting of past information about the base classes. Our experiments, conducted on three common benchmarks, demonstrate that our method significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-18T13:49:27Z)
Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM. Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z)
HFT-ONLSTM: Hierarchical and Fine-Tuning Multi-label Text Classification [7.176984223240199]
Hierarchical multi-label text classification (HMTC) with higher accuracy over large sets of closely related categories has become a challenging problem. We present a hierarchical and fine-tuning approach based on the Ordered Neural LSTM neural network, abbreviated as HFT-ONLSTM, for more accurate level-by-level HMTC.
arXiv Detail & Related papers (2022-04-18T00:57:46Z)
An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels [49.036212158261215]
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications. Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs) We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs. We propose a new state-of-the-art method which combines BERT with LWANs.
arXiv Detail & Related papers (2020-10-04T18:55:47Z)
Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50. Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z)
Joint Embedding of Words and Category Labels for Hierarchical Multi-label Text Classification [4.2750700546937335]
hierarchical text classification (HTC) has received extensive attention and has broad application prospects. We propose a joint embedding of text and parent category based on hierarchical fine-tuning ordered neurons LSTM (HFT-ONLSTM) for HTC.
arXiv Detail & Related papers (2020-04-06T11:06:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.