Related papers: Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation

Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation

URL: http://arxiv.org/abs/2403.00825v1
Date: Tue, 27 Feb 2024 07:26:16 GMT
Title: Comparing effectiveness of regularization methods on text classification: Simple and complex model in data shortage situation
Authors: Jongga Lee, Jaeseung Yim, Seohee Park, Changwon Lim
Abstract summary: We study the regularization methods' effects on various classification models when only a few labeled data are available. We compare a simple word embedding-based model, which is simple but effective, with complex models. We evaluate the regularization effects on four text classification datasets.
Score: 0.8848340429852071
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text classification is the task of assigning a document to a predefined class. However, it is expensive to acquire enough labeled documents or to label them. In this paper, we study the regularization methods' effects on various classification models when only a few labeled data are available. We compare a simple word embedding-based model, which is simple but effective, with complex models (CNN and BiLSTM). In supervised learning, adversarial training can further regularize the model. When an unlabeled dataset is available, we can regularize the model using semi-supervised learning methods such as the Pi model and virtual adversarial training. We evaluate the regularization effects on four text classification datasets (AG news, DBpedia, Yahoo! Answers, Yelp Polarity), using only 0.1% to 0.5% of the original labeled training documents. The simple model performs relatively well in fully supervised learning, but with the help of adversarial training and semi-supervised learning, both simple and complex models can be regularized, showing better results for complex models. Although the simple model is robust to overfitting, a complex model with well-designed prior beliefs can be also robust to overfitting.

Related papers

Provable unlearning in topic modeling and downstream tasks [36.571324268874264]
Provable guarantees for unlearning are often limited to supervised learning settings. We provide the first theoretical guarantees for unlearning in the pre-training and fine-tuning paradigm. We show that it is easier to unlearn pre-training data from models that have been fine-tuned to a particular task, and one can unlearn this data without modifying the base model.
arXiv Detail & Related papers (2024-11-19T16:04:31Z)
Pre-Trained Vision-Language Models as Partial Annotators [40.89255396643592]
Pre-trained vision-language models learn massive data to model unified representations of images and natural languages. In this paper, we investigate a novel "pre-trained annotating - weakly-supervised learning" paradigm for pre-trained model application and experiment on image classification tasks.
arXiv Detail & Related papers (2024-05-23T17:17:27Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions. In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution. Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z)
Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification. We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora. It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons. We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z)
Practical Active Learning with Model Selection for Small Data [13.128648437690224]
We develop a simple and fast method for practical active learning with model selection. Our method is based on an underlying pool-based active learner for binary classification using support vector classification with a radial basis function kernel.
arXiv Detail & Related papers (2021-12-21T23:11:27Z)
Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition [55.362258027878966]
We present momentum pseudo-labeling (MPL) as a simple yet effective strategy for semi-supervised speech recognition. MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method. The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios.
arXiv Detail & Related papers (2021-06-16T16:24:55Z)
Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution. This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes. Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z)
Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability. We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code. We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z)
A Comparison of LSTM and BERT for Small Corpus [0.0]
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from scratch. In this paper we focus on a real-life scenario that scientists in academia and industry face frequently: given a small dataset, can we use a large pre-trained model like BERT and get better results than simple models? Our experimental results show that bidirectional LSTM models can achieve significantly higher results than a BERT model for a small dataset and these simple models get trained in much less time than tuning the pre-trained counterparts.
arXiv Detail & Related papers (2020-09-11T14:01:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.