Preserving Commonsense Knowledge from Pre-trained Language Models via
Causal Inference
- URL: http://arxiv.org/abs/2306.10790v1
- Date: Mon, 19 Jun 2023 09:06:44 GMT
- Title: Preserving Commonsense Knowledge from Pre-trained Language Models via
Causal Inference
- Authors: Junhao Zheng, Qianli Ma, Shengjie Qiu, Yue Wu, Peitian Ma, Junlong
Liu, Huawen Feng, Xichen Shang and Haibin Chen
- Abstract summary: Most existing studies attribute it to catastrophic forgetting, and they retain the pre-trained knowledge indiscriminately.
We frame fine-tuning into a causal graph and discover that the crux of catastrophic forgetting lies in the missing causal effects from the pretrained data.
In the experiments, our method outperforms state-of-the-art fine-tuning methods on all six commonsense QA datasets.
- Score: 20.5696436171006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning has been proven to be a simple and effective technique to
transfer the learned knowledge of Pre-trained Language Models (PLMs) to
downstream tasks. However, vanilla fine-tuning easily overfits the target data
and degrades the generalization ability. Most existing studies attribute it to
catastrophic forgetting, and they retain the pre-trained knowledge
indiscriminately without identifying what knowledge is transferable. Motivated
by this, we frame fine-tuning into a causal graph and discover that the crux of
catastrophic forgetting lies in the missing causal effects from the pretrained
data. Based on the causal view, we propose a unified objective for fine-tuning
to retrieve the causality back. Intriguingly, the unified objective can be seen
as the sum of the vanilla fine-tuning objective, which learns new knowledge
from target data, and the causal objective, which preserves old knowledge from
PLMs. Therefore, our method is flexible and can mitigate negative transfer
while preserving knowledge. Since endowing models with commonsense is a
long-standing challenge, we implement our method on commonsense QA with a
proposed heuristic estimation to verify its effectiveness. In the experiments,
our method outperforms state-of-the-art fine-tuning methods on all six
commonsense QA datasets and can be implemented as a plug-in module to inflate
the performance of existing QA models.
Related papers
- Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Decoupling the Class Label and the Target Concept in Machine Unlearning [81.69857244976123]
Machine unlearning aims to adjust a trained model to approximate a retrained one that excludes a portion of training data.
Previous studies showed that class-wise unlearning is successful in forgetting the knowledge of a target class.
We propose a general framework, namely, TARget-aware Forgetting (TARF)
arXiv Detail & Related papers (2024-06-12T14:53:30Z) - R-Tuning: Instructing Large Language Models to Say `I Don't Know' [66.11375475253007]
Large language models (LLMs) have revolutionized numerous domains with their impressive performance but still face their challenges.
Previous instruction tuning methods force the model to complete a sentence no matter whether the model knows the knowledge or not.
We present a new approach called Refusal-Aware Instruction Tuning (R-Tuning)
Experimental results demonstrate R-Tuning effectively improves a model's ability to answer known questions and refrain from answering unknown questions.
arXiv Detail & Related papers (2023-11-16T08:45:44Z) - Towards Causal Foundation Model: on Duality between Causal Inference and Attention [18.046388712804042]
We take a first step towards building causally-aware foundation models for treatment effect estimations.
We propose a novel, theoretically justified method called Causal Inference with Attention (CInA)
arXiv Detail & Related papers (2023-10-01T22:28:34Z) - Disposable Transfer Learning for Selective Source Task Unlearning [31.020636963762836]
Transfer learning is widely used for training deep neural networks (DNN) for building a powerful representation.
disposable transfer learning (DTL) disposes of only the source task without degrading the performance of the target task.
We show that GC loss is an effective approach to the DTL problem by showing that the model trained with GC loss retains the performance on the target task with a significantly reduced PL accuracy.
arXiv Detail & Related papers (2023-08-19T10:13:17Z) - SRIL: Selective Regularization for Class-Incremental Learning [5.810252620242912]
Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge.
We propose a selective regularization method that accepts new knowledge while maintaining previous knowledge.
We validate the effectiveness of the proposed method through extensive experimental protocols using CIFAR-100, ImageNet-Subset, and ImageNet-Full.
arXiv Detail & Related papers (2023-05-09T05:04:35Z) - Data Poisoning Attack Aiming the Vulnerability of Continual Learning [25.480762565632332]
We present a simple task-specific data poisoning attack that can be used in the learning process of a new task.
We experiment with the attack on the two representative regularization-based continual learning methods.
arXiv Detail & Related papers (2022-11-29T02:28:05Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Self-Supervised Learning Aided Class-Incremental Lifelong Learning [17.151579393716958]
We study the issue of catastrophic forgetting in class-incremental learning (Class-IL)
In training procedure of Class-IL, as the model has no knowledge about following tasks, it would only extract features necessary for tasks learned so far, whose information is insufficient for joint classification.
We propose to combine self-supervised learning, which can provide effective representations without requiring labels, with Class-IL to partly get around this problem.
arXiv Detail & Related papers (2020-06-10T15:15:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.