Recent Advances in Hierarchical Multi-label Text Classification: A
Survey
- URL: http://arxiv.org/abs/2307.16265v1
- Date: Sun, 30 Jul 2023 16:13:00 GMT
- Title: Recent Advances in Hierarchical Multi-label Text Classification: A
Survey
- Authors: Rundong Liu, Wenhan Liang, Weijun Luo, Yuxiang Song, He Zhang, Ruohua
Xu, Yunfeng Li, Ming Liu
- Abstract summary: Hierarchical multi-label text classification aims to classify the input text into multiple labels, among which the labels are structured and hierarchical.
It is a vital task in many real world applications, e.g. scientific literature archiving.
- Score: 11.709847202580505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical multi-label text classification aims to classify the input text
into multiple labels, among which the labels are structured and hierarchical.
It is a vital task in many real world applications, e.g. scientific literature
archiving. In this paper, we survey the recent progress of hierarchical
multi-label text classification, including the open sourced data sets, the main
methods, evaluation metrics, learning strategies and the current challenges. A
few future research directions are also listed for community to further improve
this field.
Related papers
- Open-world Multi-label Text Classification with Extremely Weak Supervision [30.85235057480158]
We study open-world multi-label text classification under extremely weak supervision (XWS)
We first utilize the user description to prompt a large language model (LLM) for dominant keyphrases of a subset of raw documents, and then construct a label space via clustering.
We then apply a zero-shot multi-label classifier to locate the documents with small top predicted scores, so we can revisit their dominant keyphrases for more long-tail labels.
X-MLClass exhibits a remarkable increase in ground-truth label space coverage on various datasets.
arXiv Detail & Related papers (2024-07-08T04:52:49Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Hierarchical Multi-Label Classification of Scientific Documents [47.293189105900524]
We introduce a new dataset for hierarchical multi-label text classification of scientific papers called SciHTC.
This dataset contains 186,160 papers and 1,233 categories from the ACM CCS tree.
Our best model achieves a Macro-F1 score of 34.57% which shows that this dataset provides significant research opportunities.
arXiv Detail & Related papers (2022-11-05T04:12:57Z) - Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM.
Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z) - Academic Resource Text Level Multi-label Classification based on
Attention [16.71166207897885]
Hierarchical multi-label academic text classification (HMTC) is to assign academic texts into a hierarchically structured labeling system.
We propose an attention-based hierarchical multi-label classification algorithm of academic texts (AHMCA) by integrating features such as text, keywords, and hierarchical structure.
arXiv Detail & Related papers (2022-03-21T05:32:35Z) - MotifClass: Weakly Supervised Text Classification with Higher-order
Metadata Information [47.44278057062421]
We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only.
To be specific, we model the relationships between documents and metadata via a heterogeneous information network.
We propose a novel framework, named MotifClass, which selects category-indicative motif instances, retrieves and generates pseudo-labeled training samples based on category names and indicative motif instances.
arXiv Detail & Related papers (2021-11-07T07:39:10Z) - Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for
Proposal Classification [21.190465278587045]
Proposal classification aims to classify a proposal into a length-variant sequence of labels.
We develop a new deep proposal classification framework to jointly model the three features.
Our model can automatically identify the best length of label sequence to stop next label prediction.
arXiv Detail & Related papers (2021-09-14T13:09:28Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.