Making Pre-trained Language Models Good Long-tailed Learners
- URL: http://arxiv.org/abs/2205.05461v1
- Date: Wed, 11 May 2022 13:03:55 GMT
- Title: Making Pre-trained Language Models Good Long-tailed Learners
- Authors: Chen Zhang, Lei Ren, Jingang Wang, Wei Wu, Dawei Song
- Abstract summary: We check the hypothesis that prompt-tuning is also a promising choice for long-tailed classification.
The results demonstrate that prompt-tuning exactly makes pre-trained language models at least good long-tailed learners.
- Score: 14.63635884051461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompt-tuning has shown appealing performance in few-shot classification by
virtue of its capability in effectively exploiting pre-trained knowledge. This
motivates us to check the hypothesis that prompt-tuning is also a promising
choice for long-tailed classification, since the tail classes are intuitively
few-shot ones. To achieve this aim, we conduct empirical studies to examine the
hypothesis. The results demonstrate that prompt-tuning exactly makes
pre-trained language models at least good long-tailed learners. For intuitions
on why prompt-tuning can achieve good performance in long-tailed
classification, we carry out an in-depth analysis by progressively bridging the
gap between prompt-tuning and commonly used fine-tuning. The summary is that
the classifier structure and parameterization form the key to making good
long-tailed learners, in comparison with the less important input structure.
Finally, we verify the applicability of our finding to few-shot classification.
Related papers
- Incremental Sequence Classification with Temporal Consistency [9.65650774513798]
We address the problem of incremental sequence classification, where predictions are updated as new elements in the sequence are revealed.<n>We leverage a temporal-consistency condition that successive predictions should satisfy to develop a novel loss function for training incremental sequence classifiers.<n>Our results show that models trained with our method are better able to distinguish promising generations from unpromising ones after observing only a few tokens.
arXiv Detail & Related papers (2025-05-22T11:37:53Z) - Focus on the Likely: Test-time Instance-based Uncertainty Removal [1.8592384822257952]
We propose two novel test-time fine-tuning methods to improve uncertain model predictions.<n>Instead of greedily selecting the most likely class, we introduce an additional step, emphfocus on the likely classes, to refine predictions.
arXiv Detail & Related papers (2025-05-02T21:06:53Z) - Revisiting the Superficial Alignment Hypothesis [0.9831489366502302]
The Superficial Alignment Hypothesis posits that almost all of a language model's abilities and knowledge are learned during pre-training.
We re-examine these claims by studying the scaling behavior of post-training with increasing finetuning examples.
arXiv Detail & Related papers (2024-09-27T22:14:10Z) - Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models [17.288865972774587]
We investigate the relationship between pre-training and fine-tuning by fine-tuning multiple intermediate pre-trained model checkpoints.
Our results on 18 datasets suggest that pre-training improves the model in a latent way that unveils after fine-tuning.
arXiv Detail & Related papers (2024-08-13T06:28:43Z) - Exploring Lottery Prompts for Pre-trained Language Models [46.66885465183664]
We explore the instance-level prompt and their generalizability.
We find that for every instance, there is almost always a lottery prompt that induces the correct prediction from the PLM.
Some strong lottery prompts have high performance over the whole training set.
arXiv Detail & Related papers (2023-05-31T02:17:04Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Time Series Contrastive Learning with Information-Aware Augmentations [57.45139904366001]
A key component of contrastive learning is to select appropriate augmentations imposing some priors to construct feasible positive samples.
How to find the desired augmentations of time series data that are meaningful for given contrastive learning tasks and datasets remains an open question.
We propose a new contrastive learning approach with information-aware augmentations, InfoTS, that adaptively selects optimal augmentations for time series representation learning.
arXiv Detail & Related papers (2023-03-21T15:02:50Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning [57.4036085386653]
We show that prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inferences based on lexical overlap.
We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning.
arXiv Detail & Related papers (2021-09-09T10:10:29Z) - Explain and Predict, and then Predict Again [6.865156063241553]
We propose ExPred, that uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses.
We conduct an extensive evaluation of our approach on three diverse language datasets.
arXiv Detail & Related papers (2021-01-11T19:36:52Z) - Predicting MOOCs Dropout Using Only Two Easily Obtainable Features from
the First Week's Activities [56.1344233010643]
Several features are considered to contribute towards learner attrition or lack of interest, which may lead to disengagement or total dropout.
This study aims to predict dropout early-on, from the first week, by comparing several machine-learning approaches.
arXiv Detail & Related papers (2020-08-12T10:44:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.