Related papers: Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification

Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification

URL: http://arxiv.org/abs/2007.02439v2
Date: Sat, 15 Aug 2020 01:41:34 GMT
Title: Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification
Authors: Hui Ye, Zhiyu Chen, Da-Han Wang, Brian D. Davison
Abstract summary: We propose a novel deep learning method called APLC-XLNet. Our approach fine-tunes the recently released generalized autoregressive pretrained model (XLNet) to learn a dense representation for the input text. Our experiments, carried out on five benchmark datasets, show that our approach has achieved new state-of-the-art results.
Score: 24.665469885904145
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extreme multi-label text classification (XMTC) is a task for tagging a given text with the most relevant labels from an extremely large label set. We propose a novel deep learning method called APLC-XLNet. Our approach fine-tunes the recently released generalized autoregressive pretrained model (XLNet) to learn a dense representation for the input text. We propose Adaptive Probabilistic Label Clusters (APLC) to approximate the cross entropy loss by exploiting the unbalanced label distribution to form clusters that explicitly reduce the computational time. Our experiments, carried out on five benchmark datasets, show that our approach has achieved new state-of-the-art results on four benchmark datasets. Our source code is available publicly at https://github.com/huiyegit/APLC_XLNet.

Related papers

Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss [6.244642999033755]
Extreme Multi-label Classification (XMC) methods predict relevant labels for a given query in an extremely large label space. Recent works in XMC address this problem using deep encoders that project text descriptions to an embedding space suitable for recovering the closest labels. We propose PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches.
arXiv Detail & Related papers (2024-10-27T10:24:23Z)
Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data. This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning. We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z)
RankMatch: A Novel Approach to Semi-Supervised Label Distribution Learning Leveraging Inter-label Correlations [52.549807652527306]
This paper introduces RankMatch, an innovative approach for Semi-Supervised Label Distribution Learning (SSLDL) RankMatch effectively utilizes a small number of labeled examples in conjunction with a larger quantity of unlabeled data. We establish a theoretical generalization bound for RankMatch, and through extensive experiments, demonstrate its superiority in performance against existing SSLDL methods.
arXiv Detail & Related papers (2023-12-11T12:47:29Z)
Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications. In this paper, we reformulate the label-noise problem from a generative-model perspective. Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z)
Learned Label Aggregation for Weak Supervision [8.819582879892762]
We propose a data programming approach that aggregates weak supervision signals to generate labeled data easily. The quality of the generated labels depends on a label aggregation model that aggregates all noisy labels from all LFs to infer the ground-truth labels. We show the model can be trained using synthetically generated data and design an effective architecture for the model.
arXiv Detail & Related papers (2022-07-27T14:36:35Z)
Self-Adaptive Label Augmentation for Semi-supervised Few-shot Classification [121.63992191386502]
Few-shot classification aims to learn a model that can generalize well to new tasks when only a few labeled samples are available. We propose a semi-supervised few-shot classification method that assigns an appropriate label to each unlabeled sample by a manually defined metric. A major novelty of SALA is the task-adaptive metric, which can learn the metric adaptively for different tasks in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-16T13:14:03Z)
Long-tailed Extreme Multi-label Text Classification with Generated Pseudo Label Descriptions [28.416742933744942]
This paper addresses the challenge of tail label prediction by proposing a novel approach. It combines the effectiveness of a trained bag-of-words (BoW) classifier in generating informative label descriptions under severe data scarce conditions. The proposed approach achieves state-of-the-art performance on XMTC benchmark datasets and significantly outperforms the best methods so far in the tail label prediction.
arXiv Detail & Related papers (2022-04-02T23:42:32Z)
Label Confusion Learning to Enhance Text Classification Models [3.0251266104313643]
Label Confusion Model (LCM) learns label confusion to capture semantic overlap among labels. LCM can generate a better label distribution to replace the original one-hot label vector. experiments on five text classification benchmark datasets reveal the effectiveness of LCM for several widely used deep learning classification models.
arXiv Detail & Related papers (2020-12-09T11:34:35Z)
Coping with Label Shift via Distributionally Robust Optimisation [72.80971421083937]
We propose a model that minimises an objective based on distributionally robust optimisation (DRO) We then design and analyse a gradient descent-proximal mirror ascent algorithm tailored for large-scale problems to optimise the proposed objective.
arXiv Detail & Related papers (2020-10-23T08:33:04Z)
Instance Credibility Inference for Few-Shot Learning [45.577880041135785]
Few-shot learning aims to recognize new objects with extremely limited training data for each category. This paper presents a simple statistical approach, dubbed Instance Credibility Inference (ICI) to exploit the distribution support of unlabeled instances for few-shot learning. Our simple approach can establish new state-of-the-arts on four widely used few-shot learning benchmark datasets.
arXiv Detail & Related papers (2020-03-26T12:01:15Z)
Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label. Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data. This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.