Related papers: IDoFew: Intermediate Training Using Dual-Clustering in Language Models for Few Labels Text Classification

IDoFew: Intermediate Training Using Dual-Clustering in Language Models for Few Labels Text Classification

URL: http://arxiv.org/abs/2401.04025v1
Date: Mon, 8 Jan 2024 17:07:37 GMT
Title: IDoFew: Intermediate Training Using Dual-Clustering in Language Models for Few Labels Text Classification
Authors: Abdullah Alsuhaibani, Hamad Zogan, Imran Razzak, Shoaib Jameel, Guandong Xu
Abstract summary: Bidirectional Representations from Transformers (BERT) have been very effective in various Natural Language Processing (NLP) and text mining tasks including text classification. Some tasks still pose challenges for these models, including text classification with limited labels. We have developed a novel two-stage intermediate clustering with subsequent fine-tuning that models the pseudo-labels reliably.
Score: 24.11420537250414
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models such as Bidirectional Encoder Representations from Transformers (BERT) have been very effective in various Natural Language Processing (NLP) and text mining tasks including text classification. However, some tasks still pose challenges for these models, including text classification with limited labels. This can result in a cold-start problem. Although some approaches have attempted to address this problem through single-stage clustering as an intermediate training step coupled with a pre-trained language model, which generates pseudo-labels to improve classification, these methods are often error-prone due to the limitations of the clustering algorithms. To overcome this, we have developed a novel two-stage intermediate clustering with subsequent fine-tuning that models the pseudo-labels reliably, resulting in reduced prediction errors. The key novelty in our model, IDoFew, is that the two-stage clustering coupled with two different clustering algorithms helps exploit the advantages of the complementary algorithms that reduce the errors in generating reliable pseudo-labels for fine-tuning. Our approach has shown significant improvements compared to strong comparative models.

Related papers

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive. LCD can distort the global distribution over strings, sampling tokens based only on local information. We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z)
Unbiased Max-Min Embedding Classification for Transductive Few-Shot Learning: Clustering and Classification Are All You Need [83.10178754323955]
Few-shot learning enables models to generalize from only a few labeled examples. We propose the Unbiased Max-Min Embedding Classification (UMMEC) Method, which addresses the key challenges in few-shot learning. Our method significantly improves classification performance with minimal labeled data, advancing the state-of-the-art in annotatedL.
arXiv Detail & Related papers (2025-03-28T07:23:07Z)
CVOCSemRPL: Class-Variance Optimized Clustering, Semantic Information Injection and Restricted Pseudo Labeling based Improved Semi-Supervised Few-Shot Learning [4.3149314441871205]
Unlabeled samples are generally cheaper to obtain and can be used to improve the few-shot learning performance of the model. We propose an approach for semi-supervised few-shot learning that performs a class-variance optimized clustering. We experimentally demonstrate that our proposed approach significantly outperforms recent state-of-the-art methods on the benchmark datasets.
arXiv Detail & Related papers (2025-01-24T11:14:35Z)
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations. Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance. We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z)
Simple-Sampling and Hard-Mixup with Prototypes to Rebalance Contrastive Learning for Text Classification [11.072083437769093]
We propose a novel model named SharpReCL for imbalanced text classification tasks. Our model even outperforms popular large language models across several datasets.
arXiv Detail & Related papers (2024-05-19T11:33:49Z)
Progressive Sub-Graph Clustering Algorithm for Semi-Supervised Domain Adaptation Speaker Verification [17.284276598514502]
We propose a novel progressive subgraph clustering algorithm based on multi-model voting and double-Gaussian based assessment. To prevent disastrous clustering results, we adopt an iterative approach that progressively increases k and employs a double-Gaussian based assessment algorithm.
arXiv Detail & Related papers (2023-05-22T04:26:18Z)
SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation. We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z)
Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Prototypical Calibration for Few-shot Learning of Language Models [84.5759596754605]
GPT-like models have been recognized as fragile across different hand-crafted templates, and demonstration permutations. We propose prototypical calibration to adaptively learn a more robust decision boundary for zero- and few-shot classification. Our method calibrates the decision boundary as expected, greatly improving the robustness of GPT to templates, permutations, and class imbalance.
arXiv Detail & Related papers (2022-05-20T13:50:07Z)
Adaptive label thresholding methods for online multi-label classification [4.028101568570768]
Existing online multi-label classification works cannot handle the online label thresholding problem. This paper proposes a novel framework of adaptive label thresholding algorithms for online multi-label classification.
arXiv Detail & Related papers (2021-12-04T10:34:09Z)
Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks. We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z)
Enhancement of Short Text Clustering by Iterative Classification [0.0]
iterative classification applies outlier removal to obtain outlier-free clusters. It trains a classification algorithm using the non-outliers based on their cluster distributions. By repeating this several times, we obtain a much improved clustering of texts.
arXiv Detail & Related papers (2020-01-31T02:12:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.