APAM: Adaptive Pre-training and Adaptive Meta Learning in Language Model
for Noisy Labels and Long-tailed Learning
- URL: http://arxiv.org/abs/2302.03488v2
- Date: Tue, 2 May 2023 22:29:38 GMT
- Title: APAM: Adaptive Pre-training and Adaptive Meta Learning in Language Model
for Noisy Labels and Long-tailed Learning
- Authors: Sunyi Chi, Bo Dong, Yiming Xu, Zhenyu Shi, Zheng Du
- Abstract summary: Practical natural language processing (NLP) tasks are commonly long-tailed with noisy labels.
Some commonly used resampling techniques, such as oversampling or undersampling, could easily lead to overfitting.
We propose a general framework to handle the problem of both long-tail and noisy labels.
- Score: 9.433150673299163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Practical natural language processing (NLP) tasks are commonly long-tailed
with noisy labels. Those problems challenge the generalization and robustness
of complex models such as Deep Neural Networks (DNNs). Some commonly used
resampling techniques, such as oversampling or undersampling, could easily lead
to overfitting. It is growing popular to learn the data weights leveraging a
small amount of metadata. Besides, recent studies have shown the advantages of
self-supervised pre-training, particularly to the under-represented data. In
this work, we propose a general framework to handle the problem of both
long-tail and noisy labels. The model is adapted to the domain of problems in a
contrastive learning manner. The re-weighting module is a feed-forward network
that learns explicit weighting functions and adapts weights according to
metadata. The framework further adapts weights of terms in the loss function
through a combination of the polynomial expansion of cross-entropy loss and
focal loss. Our extensive experiments show that the proposed framework
consistently outperforms baseline methods. Lastly, our sensitive analysis
emphasizes the capability of the proposed framework to handle the long-tailed
problem and mitigate the negative impact of noisy labels.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Quantized Prompt for Efficient Generalization of Vision-Language Models [27.98205540768322]
Large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields.
During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting.
In this paper, we explore quantization for regularizing vision-language model, which is quite efficiency and effective.
arXiv Detail & Related papers (2024-07-15T13:19:56Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Meta-Learning Online Adaptation of Language Models [88.8947656843812]
Large language models encode impressively broad world knowledge in their parameters.
However, the knowledge in static language models falls out of date, limiting the model's effective "shelf life"
arXiv Detail & Related papers (2023-05-24T11:56:20Z) - Theoretical Characterization of the Generalization Performance of
Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features.
We find new and interesting properties that do not exist in single-task linear regression.
Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z) - Difficulty-Net: Learning to Predict Difficulty for Long-Tailed
Recognition [5.977483447975081]
We propose Difficulty-Net, which learns to predict the difficulty of classes using the model's performance in a meta-learning framework.
We introduce two key concepts, namely the relative difficulty and the driver loss.
Experiments on popular long-tailed datasets demonstrated the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-09-07T07:04:08Z) - Adaptable Text Matching via Meta-Weight Regulator [14.619068650513917]
Meta-Weight Regulator (MWR) is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss.
MWR first trains the model on uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function.
As MWR is model-agnostic, it can be applied to any backbone neural model.
arXiv Detail & Related papers (2022-04-27T02:28:40Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.