Preserving Generalization of Language models in Few-shot Continual Relation Extraction
- URL: http://arxiv.org/abs/2410.00334v1
- Date: Tue, 1 Oct 2024 02:22:34 GMT
- Title: Preserving Generalization of Language models in Few-shot Continual Relation Extraction
- Authors: Quyen Tran, Nguyen Xuan Thanh, Nguyen Hoang Anh, Nam Le Hai, Trung Le, Linh Van Ngo, Thien Huu Nguyen,
- Abstract summary: Few-shot Continual Relations Extraction (FCRE) is an emerging and dynamic area of study.
We introduce a novel method that leverages often-discarded language model heads.
Our experimental results underscore the efficacy of the proposed method and offer valuable insights for future work.
- Score: 34.68364639170838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Few-shot Continual Relations Extraction (FCRE) is an emerging and dynamic area of study where models can sequentially integrate knowledge from new relations with limited labeled data while circumventing catastrophic forgetting and preserving prior knowledge from pre-trained backbones. In this work, we introduce a novel method that leverages often-discarded language model heads. By employing these components via a mutual information maximization strategy, our approach helps maintain prior knowledge from the pre-trained backbone and strategically aligns the primary classification head, thereby enhancing model performance. Furthermore, we explore the potential of Large Language Models (LLMs), renowned for their wealth of knowledge, in addressing FCRE challenges. Our comprehensive experimental results underscore the efficacy of the proposed method and offer valuable insights for future work.
Related papers
- Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Leveraging Hierarchical Taxonomies in Prompt-based Continual Learning [41.13568563835089]
We find that applying human habits of organizing and connecting information can serve as an efficient strategy when training deep learning models.
We propose a novel regularization loss function that encourages models to focus more on challenging knowledge areas.
arXiv Detail & Related papers (2024-10-06T01:30:40Z) - Making Pre-trained Language Models Better Continual Few-Shot Relation
Extractors [15.417833307088637]
Continual Few-shot Relation Extraction (CFRE) is a practical problem that requires the model to continuously learn novel relations.
The primary challenges are catastrophic forgetting and overfitting.
This paper harnesses prompt learning to explore the implicit capabilities of pre-trained language models.
arXiv Detail & Related papers (2024-02-24T04:32:44Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Schema-aware Reference as Prompt Improves Data-Efficient Knowledge Graph
Construction [57.854498238624366]
We propose a retrieval-augmented approach, which retrieves schema-aware Reference As Prompt (RAP) for data-efficient knowledge graph construction.
RAP can dynamically leverage schema and knowledge inherited from human-annotated and weak-supervised data as a prompt for each sample.
arXiv Detail & Related papers (2022-10-19T16:40:28Z) - A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models [185.08295787309544]
We aim to summarize the current progress of pre-trained language model-based knowledge-enhanced models (PLMKEs)
We present the challenges of PLMKEs based on the discussion regarding the three elements and attempt to provide NLP practitioners with potential directions for further research.
arXiv Detail & Related papers (2022-02-17T17:17:43Z) - Class-Incremental Continual Learning into the eXtended DER-verse [17.90483695137098]
This work aims at assessing and overcoming the pitfalls of our previous proposal Dark Experience Replay (DER)
Inspired by the way our minds constantly rewrite past recollections and set expectations for the future, we endow our model with the abilities to i) revise its replay memory to welcome novel information regarding past data.
We show that the application of these strategies leads to remarkable improvements.
arXiv Detail & Related papers (2022-01-03T17:14:30Z) - DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for
Natural Language Understanding [19.478288026844893]
Knowledge-Enhanced Pre-trained Language Models (KEPLMs) are pre-trained models with relation triples injecting from knowledge graphs to improve language understanding abilities.
Previous studies integrate models with knowledge encoders for representing knowledge retrieved from knowledge graphs.
We propose a novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages.
arXiv Detail & Related papers (2021-12-02T08:19:42Z) - Continual Learning for Natural Language Generation in Task-oriented
Dialog Systems [72.92029584113676]
Natural language generation (NLG) is an essential component of task-oriented dialog systems.
We study NLG in a "continual learning" setting to expand its knowledge to new domains or functionalities incrementally.
The major challenge towards this goal is catastrophic forgetting, meaning that a continually trained model tends to forget the knowledge it has learned before.
arXiv Detail & Related papers (2020-10-02T10:32:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.