Making Pre-trained Language Models Better Continual Few-Shot Relation
Extractors
- URL: http://arxiv.org/abs/2402.15713v1
- Date: Sat, 24 Feb 2024 04:32:44 GMT
- Title: Making Pre-trained Language Models Better Continual Few-Shot Relation
Extractors
- Authors: Shengkun Ma, Jiale Han, Yi Liang, Bo Cheng
- Abstract summary: Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations.
The primary challenges are catastrophic forgetting and overfitting.
This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models.
- Score: 15.417833307088637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Few-shot Relation Extraction (CFRE) is a practical problem that
requires the model to continuously learn novel relations while avoiding
forgetting old ones with few labeled training data. The primary challenges are
catastrophic forgetting and overfitting. This paper harnesses prompt learning
to explore the implicit capabilities of pre-trained language models to address
the above two challenges, thereby making language models better continual
few-shot relation extractors. Specifically, we propose a Contrastive Prompt
Learning framework, which designs prompt representation to acquire more
generalized knowledge that can be easily adapted to old and new categories, and
margin-based contrastive learning to focus more on hard samples, therefore
alleviating catastrophic forgetting and overfitting issues. To further remedy
overfitting in low-resource scenarios, we introduce an effective memory
augmentation strategy that employs well-crafted prompts to guide ChatGPT in
generating diverse samples. Extensive experiments demonstrate that our method
outperforms state-of-the-art methods by a large margin and significantly
mitigates catastrophic forgetting and overfitting in low-resource scenarios.
Related papers
- Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging [102.16497861225358]
Adapting general-purpose language models to new skills is currently an expensive process.
We investigate the effectiveness of adding new skills to preexisting models by training on the new skills in isolation and later merging with the general model.
arXiv Detail & Related papers (2024-10-16T18:23:50Z) - Preserving Generalization of Language models in Few-shot Continual Relation Extraction [34.68364639170838]
Few-shot Continual Relations Extraction (FCRE) is an emerging and dynamic area of study.
We introduce a novel method that leverages often-discarded language model heads.
Our experimental results underscore the efficacy of the proposed method and offer valuable insights for future work.
arXiv Detail & Related papers (2024-10-01T02:22:34Z) - Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy
for Language Models [35.58379464827462]
We introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models.
Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews.
arXiv Detail & Related papers (2023-10-19T23:02:29Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Prototypical quadruplet for few-shot class incremental learning [24.814045065163135]
We propose a novel method that improves classification robustness by identifying a better embedding space using an improved contrasting loss.
Our approach retains previously acquired knowledge in the embedding space, even when trained with new classes.
We demonstrate the effectiveness of our method by showing that the embedding space remains intact after training the model with new classes and outperforms existing state-of-the-art algorithms in terms of accuracy across different sessions.
arXiv Detail & Related papers (2022-11-05T17:19:14Z) - Few-shot Text Classification with Dual Contrastive Consistency [31.141350717029358]
In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification.
We adopt supervised contrastive learning on few labeled data and consistency-regularization on vast unlabeled data.
arXiv Detail & Related papers (2022-09-29T19:26:23Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Continual Few-shot Relation Learning via Embedding Space Regularization
and Data Augmentation [4.111899441919165]
It is necessary for the model to learn novel relational patterns with very few labeled data while avoiding catastrophic forgetting of previous task knowledge.
We propose a novel method based on embedding space regularization and data augmentation.
Our method generalizes to new few-shot tasks and avoids catastrophic forgetting of previous tasks by enforcing extra constraints on the relational embeddings and by adding extra relevant data in a self-supervised manner.
arXiv Detail & Related papers (2022-03-04T05:19:09Z) - How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial
Robustness? [121.57551065856164]
We propose Robust Informative Fine-Tuning (RIFT) as a novel adversarial fine-tuning method from an information-theoretical perspective.
RIFT encourages an objective model to retain the features learned from the pre-trained model throughout the entire fine-tuning process.
Experimental results show that RIFT consistently outperforms the state-of-the-arts on two popular NLP tasks.
arXiv Detail & Related papers (2021-12-22T05:04:41Z) - Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.