Incorporating Hierarchy into Text Encoder: a Contrastive Learning
Approach for Hierarchical Text Classification
- URL: http://arxiv.org/abs/2203.03825v1
- Date: Tue, 8 Mar 2022 03:21:45 GMT
- Title: Incorporating Hierarchy into Text Encoder: a Contrastive Learning
Approach for Hierarchical Text Classification
- Authors: Zihan Wang, Peiyi Wang, Lianzhe Huang, Xin Sun, Houfeng Wang
- Abstract summary: We propose a hierarchy-guided Contrastive Learning (HGCLR) to embed the label hierarchy into a text encoder.
During training, HGCLR constructs positive samples for input text under the guidance of the label hierarchy.
After training, the HGCLR enhanced text encoder can dispense with the redundant hierarchy.
- Score: 23.719121637849806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical text classification is a challenging subtask of multi-label
classification due to its complex label hierarchy. Existing methods encode text
and label hierarchy separately and mix their representations for
classification, where the hierarchy remains unchanged for all input text.
Instead of modeling them separately, in this work, we propose Hierarchy-guided
Contrastive Learning (HGCLR) to directly embed the hierarchy into a text
encoder. During training, HGCLR constructs positive samples for input text
under the guidance of the label hierarchy. By pulling together the input text
and its positive sample, the text encoder can learn to generate the
hierarchy-aware text representation independently. Therefore, after training,
the HGCLR enhanced text encoder can dispense with the redundant hierarchy.
Extensive experiments on three benchmark datasets verify the effectiveness of
HGCLR.
Related papers
- Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for
Imbalanced Medical Classification [9.391704905671476]
This paper aims to rethink the data challenges in medical texts and present a novel framework-agnostic algorithm called Text2Tree.
We embed the ICD code tree structure of labels into cascade attention modules for learning hierarchy-aware label representations.
Two new learning schemes, Similarity Surrogate Learning (SSL) and Dissimilarity Mixup Learning (DML), are devised to boost text classification by reusing and distinguishing samples of other labels.
arXiv Detail & Related papers (2023-11-28T10:02:08Z) - HiTIN: Hierarchy-aware Tree Isomorphism Network for Hierarchical Text
Classification [18.03202012033514]
We propose hierarchy-aware Tree Isomorphism Network (HiTIN) to enhance the text representations with only syntactic information of the label hierarchy.
We conduct experiments on three commonly used datasets and the results demonstrate that HiTIN could achieve better test performance and less memory consumption.
arXiv Detail & Related papers (2023-05-24T14:14:08Z) - Exploiting Global and Local Hierarchies for Hierarchical Text
Classification [34.624922210257125]
Existing methods encode label hierarchy in a global view, where label hierarchy is treated as the static hierarchical structure containing all labels.
We propose Hierarchy-guided BERT with Global and Local hierarchies (HBGL) to model both global and local hierarchies.
Compared with the state-of-the-art method HGCLR, our method achieves significant improvement on three benchmark datasets.
arXiv Detail & Related papers (2022-05-05T12:48:41Z) - Use All The Labels: A Hierarchical Multi-Label Contrastive Learning
Framework [75.79736930414715]
We present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes.
We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint.
arXiv Detail & Related papers (2022-04-27T21:41:44Z) - Constrained Sequence-to-Tree Generation for Hierarchical Text
Classification [10.143177923523407]
Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy.
In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure.
arXiv Detail & Related papers (2022-04-02T08:35:39Z) - Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z) - Academic Resource Text Level Multi-label Classification based on
Attention [16.71166207897885]
Hierarchical multi-label academic text classification (HMTC) is to assign academic texts into a hierarchically structured labeling system.
We propose an attention-based hierarchical multi-label classification algorithm of academic texts (AHMCA) by integrating features such as text, keywords, and hierarchical structure.
arXiv Detail & Related papers (2022-03-21T05:32:35Z) - Minimally-Supervised Structure-Rich Text Categorization via Learning on
Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.