Exploiting Dynamic and Fine-grained Semantic Scope for Extreme
Multi-label Text Classification
- URL: http://arxiv.org/abs/2205.11973v1
- Date: Tue, 24 May 2022 11:15:35 GMT
- Title: Exploiting Dynamic and Fine-grained Semantic Scope for Extreme
Multi-label Text Classification
- Authors: Yuan Wang and Huiling Song and Peng Huo and Tao Xu and Jucheng Yang
and Yarui Chen and Tingting Zhao
- Abstract summary: Extreme multi-label text classification (XMTC) refers to the problem of tagging a given text with the most relevant subset of labels from a large label set.
Most existing XMTC methods take advantage of fixed label clusters obtained in early stage to balance performance on tail labels and head labels.
We propose a novel framework TReaderXML for XMTC, which adopts dynamic and fine-grained semantic scope from teacher knowledge.
- Score: 12.508006325140949
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extreme multi-label text classification (XMTC) refers to the problem of
tagging a given text with the most relevant subset of labels from a large label
set. A majority of labels only have a few training instances due to large label
dimensionality in XMTC. To solve this data sparsity issue, most existing XMTC
methods take advantage of fixed label clusters obtained in early stage to
balance performance on tail labels and head labels. However, such label
clusters provide static and coarse-grained semantic scope for every text, which
ignores distinct characteristics of different texts and has difficulties
modelling accurate semantics scope for texts with tail labels. In this paper,
we propose a novel framework TReaderXML for XMTC, which adopts dynamic and
fine-grained semantic scope from teacher knowledge for individual text to
optimize text conditional prior category semantic ranges. TReaderXML
dynamically obtains teacher knowledge for each text by similar texts and
hierarchical label information in training sets to release the ability of
distinctly fine-grained label-oriented semantic scope. Then, TReaderXML
benefits from a novel dual cooperative network that firstly learns features of
a text and its corresponding label-oriented semantic scope by parallel Encoding
Module and Reading Module, secondly embeds two parts by Interaction Module to
regularize the text's representation by dynamic and fine-grained label-oriented
semantic scope, and finally find target labels by Prediction Module.
Experimental results on three XMTC benchmark datasets show that our method
achieves new state-of-the-art results and especially performs well for severely
imbalanced and sparse datasets.
Related papers
- Leveraging Label Semantics and Meta-Label Refinement for Multi-Label Question Classification [11.19022605804112]
This paper introduces RR2QC, a novel Retrieval Reranking method To multi-label Question Classification.
It uses label semantics and meta-label refinement to enhance personalized learning and resource recommendation.
Experimental results demonstrate that RR2QC outperforms existing classification methods in Precision@k and F1 scores.
arXiv Detail & Related papers (2024-11-04T06:27:14Z) - HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text
Classification [19.12354692458442]
Hierarchical text classification (HTC) is a complex subtask under multi-label text classification.
We propose HiGen, a text-generation-based framework utilizing language models to encode dynamic text representations.
arXiv Detail & Related papers (2024-01-24T04:44:42Z) - Substituting Data Annotation with Balanced Updates and Collective Loss
in Multi-label Text Classification [19.592985329023733]
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text.
We study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels.
Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph.
arXiv Detail & Related papers (2023-09-24T04:12:52Z) - MatchXML: An Efficient Text-label Matching Framework for Extreme
Multi-label Text Classification [13.799733640048672]
The eXtreme Multi-label text Classification(XMC) refers to training a classifier that assigns a text sample with relevant labels from a large-scale label set.
We propose MatchXML, an efficient text-label matching framework for XMC.
Experimental results demonstrate that MatchXML achieves state-of-the-art accuracy on five out of six datasets.
arXiv Detail & Related papers (2023-08-25T02:32:36Z) - Description-Enhanced Label Embedding Contrastive Learning for Text
Classification [65.01077813330559]
Self-Supervised Learning (SSL) in model learning process and design a novel self-supervised Relation of Relation (R2) classification task.
Relation of Relation Learning Network (R2-Net) for text classification, in which text classification and R2 classification are treated as optimization targets.
external knowledge from WordNet to obtain multi-aspect descriptions for label semantic learning.
arXiv Detail & Related papers (2023-06-15T02:19:34Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems.
LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z) - HTCInfoMax: A Global Model for Hierarchical Text Classification via
Information Maximization [75.45291796263103]
The current state-of-the-art model HiAGM for hierarchical text classification has two limitations.
It correlates each text sample with all labels in the dataset which contains irrelevant information.
We propose HTCInfoMax to address these issues by introducing information which includes two modules.
arXiv Detail & Related papers (2021-04-12T06:04:20Z) - Minimally-Supervised Structure-Rich Text Categorization via Learning on
Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.