Serial Contrastive Knowledge Distillation for Continual Few-shot
Relation Extraction
- URL: http://arxiv.org/abs/2305.06616v1
- Date: Thu, 11 May 2023 07:25:47 GMT
- Title: Serial Contrastive Knowledge Distillation for Continual Few-shot
Relation Extraction
- Authors: Xinyi Wang and Zitao Wang and Wei Hu
- Abstract summary: We propose a new model, namely SCKD, to accomplish the continual few-shot RE task.
Specifically, we design serial knowledge distillation to preserve the prior knowledge from previous models.
Our experiments on two benchmark datasets validate the effectiveness of SCKD for continual few-shot RE.
- Score: 35.79570854392989
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual few-shot relation extraction (RE) aims to continuously train a
model for new relations with few labeled training data, of which the major
challenges are the catastrophic forgetting of old relations and the overfitting
caused by data sparsity. In this paper, we propose a new model, namely SCKD, to
accomplish the continual few-shot RE task. Specifically, we design serial
knowledge distillation to preserve the prior knowledge from previous models and
conduct contrastive learning with pseudo samples to keep the representations of
samples in different relations sufficiently distinguishable. Our experiments on
two benchmark datasets validate the effectiveness of SCKD for continual
few-shot RE and its superiority in knowledge transfer and memory utilization
over state-of-the-art models.
Related papers
- PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning [49.60634126342945]
Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes.
Recent research reveals that training with CAD may lead models to overly focus on modified features while ignoring other important contextual information.
We employ contrastive learning to promote global feature alignment in addition to learning counterfactual clues.
arXiv Detail & Related papers (2024-06-09T07:29:55Z) - Contrastive Continual Learning with Importance Sampling and
Prototype-Instance Relation Distillation [14.25441464051506]
We propose Contrastive Continual Learning via Importance Sampling (CCLIS) to preserve knowledge by recovering previous data distributions.
We also present the Prototype-instance Relation Distillation (PRD) loss, a technique designed to maintain the relationship between prototypes and sample representations.
arXiv Detail & Related papers (2024-03-07T15:47:52Z) - Learning to Maximize Mutual Information for Chain-of-Thought Distillation [13.660167848386806]
Distilling Step-by-Step(DSS) has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts.
However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction.
We propose a variational approach to solve this problem using a learning-based method.
arXiv Detail & Related papers (2024-03-05T22:21:45Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Improving Continual Relation Extraction by Distinguishing Analogous
Semantics [11.420578494453343]
Continual relation extraction aims to learn constantly emerging relations while avoiding forgetting the learned relations.
Existing works store a small number of typical samples to re-train the model for alleviating forgetting.
We conduct an empirical study on existing works and observe that their performance is severely affected by analogous relations.
arXiv Detail & Related papers (2023-05-11T07:32:20Z) - How to Train Your DRAGON: Diverse Augmentation Towards Generalizable
Dense Retrieval [80.54532535622988]
We show that a generalizable dense retriever can be trained to achieve high accuracy in both supervised and zero-shot retrieval.
DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations.
arXiv Detail & Related papers (2023-02-15T03:53:26Z) - Consistent Representation Learning for Continual Relation Extraction [18.694012937149495]
A consistent representation learning method is proposed, which maintains the stability of the relation embedding.
Our method significantly outperforms state-of-the-art baselines and yield strong robustness on the imbalanced dataset.
arXiv Detail & Related papers (2022-03-05T12:16:34Z) - Continual Few-shot Relation Learning via Embedding Space Regularization
and Data Augmentation [4.111899441919165]
It is necessary for the model to learn novel relational patterns with very few labeled data while avoiding catastrophic forgetting of previous task knowledge.
We propose a novel method based on embedding space regularization and data augmentation.
Our method generalizes to new few-shot tasks and avoids catastrophic forgetting of previous tasks by enforcing extra constraints on the relational embeddings and by adding extra relevant data in a self-supervised manner.
arXiv Detail & Related papers (2022-03-04T05:19:09Z) - Exploring the Limits of Few-Shot Link Prediction in Knowledge Graphs [49.6661602019124]
We study a spectrum of models derived by generalizing the current state of the art for few-shot link prediction.
We find that a simple zero-shot baseline - which ignores any relation-specific information - achieves surprisingly strong performance.
Experiments on carefully crafted synthetic datasets show that having only a few examples of a relation fundamentally limits models from using fine-grained structural information.
arXiv Detail & Related papers (2021-02-05T21:04:31Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.