An Empirical Study on Large-Scale Multi-Label Text Classification
Including Few and Zero-Shot Labels
- URL: http://arxiv.org/abs/2010.01653v1
- Date: Sun, 4 Oct 2020 18:55:47 GMT
- Title: An Empirical Study on Large-Scale Multi-Label Text Classification
Including Few and Zero-Shot Labels
- Authors: Ilias Chalkidis, Manos Fergadiotis, Sotiris Kotitsas, Prodromos
Malakasiotis, Nikolaos Aletras and Ion Androutsopoulos
- Abstract summary: Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications.
Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs)
We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs.
We propose a new state-of-the-art method which combines BERT with LWANs.
- Score: 49.036212158261215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale Multi-label Text Classification (LMTC) has a wide range of
Natural Language Processing (NLP) applications and presents interesting
challenges. First, not all labels are well represented in the training set, due
to the very large label set and the skewed label distributions of LMTC
datasets. Also, label hierarchies and differences in human labelling guidelines
may affect graph-aware annotation proximity. Finally, the label hierarchies are
periodically updated, requiring LMTC models capable of zero-shot
generalization. Current state-of-the-art LMTC models employ Label-Wise
Attention Networks (LWANs), which (1) typically treat LMTC as flat multi-label
classification; (2) may use the label hierarchy to improve zero-shot learning,
although this practice is vastly understudied; and (3) have not been combined
with pre-trained Transformers (e.g. BERT), which have led to state-of-the-art
results in several NLP benchmarks. Here, for the first time, we empirically
evaluate a battery of LMTC methods from vanilla LWANs to hierarchical
classification approaches and transfer learning, on frequent, few, and
zero-shot learning on three datasets from different domains. We show that
hierarchical methods based on Probabilistic Label Trees (PLTs) outperform
LWANs. Furthermore, we show that Transformer-based approaches outperform the
state-of-the-art in two of the datasets, and we propose a new state-of-the-art
method which combines BERT with LWANs. Finally, we propose new models that
leverage the label hierarchy to improve few and zero-shot learning, considering
on each dataset a graph-aware annotation proximity measure that we introduce.
Related papers
- LC-Protonets: Multi-label Few-shot learning for world music audio tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification.
LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items.
Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z) - UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space.
This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z) - Substituting Data Annotation with Balanced Updates and Collective Loss
in Multi-label Text Classification [19.592985329023733]
Multi-label text classification (MLTC) is the task of assigning multiple labels to a given text.
We study the MLTC problem in annotation-free and scarce-annotation settings in which the magnitude of available supervision signals is linear to the number of labels.
Our method follows three steps, (1) mapping input text into a set of preliminary label likelihoods by natural language inference using a pre-trained language model, (2) calculating a signed label dependency graph by label descriptions, and (3) updating the preliminary label likelihoods with message passing along the label dependency graph.
arXiv Detail & Related papers (2023-09-24T04:12:52Z) - Deep Partial Multi-Label Learning with Graph Disambiguation [27.908565535292723]
We propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN)
Specifically, we introduce the instance-level and label-level similarities to recover label confidences.
At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels.
arXiv Detail & Related papers (2023-05-10T04:02:08Z) - Ground Truth Inference for Weakly Supervised Entity Matching [76.6732856489872]
We propose a simple but powerful labeling model for weak supervision tasks.
We then tailor the labeling model specifically to the task of entity matching.
We show that our labeling model results in a 9% higher F1 score on average than the best existing method.
arXiv Detail & Related papers (2022-11-13T17:57:07Z) - Binary Classification with Positive Labeling Sources [71.37692084951355]
We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources.
We show WEAPO achieves the highest averaged performance on 10 benchmark datasets.
arXiv Detail & Related papers (2022-08-02T19:32:08Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Multi-label Few/Zero-shot Learning with Knowledge Aggregated from
Multiple Label Graphs [8.44680447457879]
We present a simple multi-graph aggregation model that fuses knowledge from multiple label graphs encoding different semantic label relationships.
We show that methods equipped with the multi-graph knowledge aggregation achieve significant performance improvement across almost all the measures on few/zero-shot labels.
arXiv Detail & Related papers (2020-10-15T01:15:43Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z) - Generalized Label Enhancement with Sample Correlations [24.582764493585362]
We propose two novel label enhancement methods, i.e., Label Enhancement with Sample Correlations (LESC) and generalized Label Enhancement with Sample Correlations (gLESC)
Benefitting from the sample correlations, the proposed methods can boost the performance of label enhancement.
arXiv Detail & Related papers (2020-04-07T03:32:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.