Constrained Sequence-to-Tree Generation for Hierarchical Text
Classification
- URL: http://arxiv.org/abs/2204.00811v1
- Date: Sat, 2 Apr 2022 08:35:39 GMT
- Title: Constrained Sequence-to-Tree Generation for Hierarchical Text
Classification
- Authors: Chao Yu, Yi Shen, Yue Mao, Longjun Cai
- Abstract summary: Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy.
In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure.
- Score: 10.143177923523407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical Text Classification (HTC) is a challenging task where a document
can be assigned to multiple hierarchically structured categories within a
taxonomy. The majority of prior studies consider HTC as a flat multi-label
classification problem, which inevitably leads to "label inconsistency"
problem. In this paper, we formulate HTC as a sequence generation task and
introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical
label structure. Moreover, we design a constrained decoding strategy with
dynamic vocabulary to secure the label consistency of the results. Compared
with previous works, the proposed approach achieves significant and consistent
improvements on three benchmark datasets.
Related papers
- HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text
Classification [19.12354692458442]
Hierarchical text classification (HTC) is a complex subtask under multi-label text classification.
We propose HiGen, a text-generation-based framework utilizing language models to encode dynamic text representations.
arXiv Detail & Related papers (2024-01-24T04:44:42Z) - Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification [10.578682558356473]
hierarchical text classification (HTC) suffers a poor performance when low-resource or few-shot settings are considered.
In this work, we propose the hierarchical verbalizer ("HierVerb"), a multi-verbalizer framework treating HTC as a single- or multi-label classification problem.
In this manner, HierVerb fuses label hierarchy knowledge into verbalizers and remarkably outperforms those who inject hierarchy through graph encoders.
arXiv Detail & Related papers (2023-05-26T12:41:49Z) - HiTIN: Hierarchy-aware Tree Isomorphism Network for Hierarchical Text
Classification [18.03202012033514]
We propose hierarchy-aware Tree Isomorphism Network (HiTIN) to enhance the text representations with only syntactic information of the label hierarchy.
We conduct experiments on three commonly used datasets and the results demonstrate that HiTIN could achieve better test performance and less memory consumption.
arXiv Detail & Related papers (2023-05-24T14:14:08Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - Hierarchical Text Classification As Sub-Hierarchy Sequence Generation [8.062201442038957]
Hierarchical text classification (HTC) is essential for various real applications.
Recent HTC models have attempted to incorporate hierarchy information into a model structure.
We formulate HTC as a sub-hierarchy sequence generation to incorporate hierarchy information into a target label sequence.
HiDEC achieved state-of-the-art performance with significantly fewer model parameters than existing models on benchmark datasets.
arXiv Detail & Related papers (2021-11-22T10:50:39Z) - HTCInfoMax: A Global Model for Hierarchical Text Classification via
Information Maximization [75.45291796263103]
The current state-of-the-art model HiAGM for hierarchical text classification has two limitations.
It correlates each text sample with all labels in the dataset which contains irrelevant information.
We propose HTCInfoMax to address these issues by introducing information which includes two modules.
arXiv Detail & Related papers (2021-04-12T06:04:20Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - Coherent Hierarchical Multi-Label Classification Networks [56.41950277906307]
C-HMCNN(h) is a novel approach for HMC problems, which exploits hierarchy information in order to produce predictions coherent with the constraint and improve performance.
We conduct an extensive experimental analysis showing the superior performance of C-HMCNN(h) when compared to state-of-the-art models.
arXiv Detail & Related papers (2020-10-20T09:37:02Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - Joint Embedding of Words and Category Labels for Hierarchical
Multi-label Text Classification [4.2750700546937335]
hierarchical text classification (HTC) has received extensive attention and has broad application prospects.
We propose a joint embedding of text and parent category based on hierarchical fine-tuning ordered neurons LSTM (HFT-ONLSTM) for HTC.
arXiv Detail & Related papers (2020-04-06T11:06:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.