Kronecker Factorization for Preventing Catastrophic Forgetting in
Large-scale Medical Entity Linking
- URL: http://arxiv.org/abs/2111.06012v1
- Date: Thu, 11 Nov 2021 01:51:01 GMT
- Title: Kronecker Factorization for Preventing Catastrophic Forgetting in
Large-scale Medical Entity Linking
- Authors: Denis Jered McInerney, Luyang Kong, Kristjan Arumae, Byron Wallace,
Parminder Bhatia
- Abstract summary: In the medical domain, sequential training on tasks may sometimes be the only way to train models.
catastrophic forgetting, i.e., a substantial drop in accuracy on prior tasks when a model is updated for a new task.
We show the effectiveness of this technique on the important and illustrative task of medical entity linking across three datasets.
- Score: 7.723047334864811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-task learning is useful in NLP because it is often practically
desirable to have a single model that works across a range of tasks. In the
medical domain, sequential training on tasks may sometimes be the only way to
train models, either because access to the original (potentially sensitive)
data is no longer available, or simply owing to the computational costs
inherent to joint retraining. A major issue inherent to sequential learning,
however, is catastrophic forgetting, i.e., a substantial drop in accuracy on
prior tasks when a model is updated for a new task. Elastic Weight
Consolidation is a recently proposed method to address this issue, but scaling
this approach to the modern large models used in practice requires making
strong independence assumptions about model parameters, limiting its
effectiveness. In this work, we apply Kronecker Factorization--a recent
approach that relaxes independence assumptions--to prevent catastrophic
forgetting in convolutional and Transformer-based neural networks at scale. We
show the effectiveness of this technique on the important and illustrative task
of medical entity linking across three datasets, demonstrating the capability
of the technique to be used to make efficient updates to existing methods as
new medical data becomes available. On average, the proposed method reduces
catastrophic forgetting by 51% when using a BERT-based model, compared to a 27%
reduction using standard Elastic Weight Consolidation, while maintaining
spatial complexity proportional to the number of model parameters.
Related papers
- SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Network reconstruction via the minimum description length principle [0.0]
We propose an alternative nonparametric regularization scheme based on hierarchical Bayesian inference and weight quantization.
Our approach follows the minimum description length (MDL) principle, and uncovers the weight distribution that allows for the most compression of the data.
We demonstrate that our scheme yields systematically increased accuracy in the reconstruction of both artificial and empirical networks.
arXiv Detail & Related papers (2024-05-02T05:35:09Z) - PELA: Learning Parameter-Efficient Models with Low-Rank Approximation [16.9278983497498]
We propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage.
This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.
arXiv Detail & Related papers (2023-10-16T07:17:33Z) - Stabilizing Subject Transfer in EEG Classification with Divergence
Estimation [17.924276728038304]
We propose several graphical models to describe an EEG classification task.
We identify statistical relationships that should hold true in an idealized training scenario.
We design regularization penalties to enforce these relationships in two stages.
arXiv Detail & Related papers (2023-10-12T23:06:52Z) - AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging)
It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.
Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation [20.94974284175104]
Few-Shot Efficient Fine-Tuning (FSEFT) is a novel and realistic scenario for adapting medical image segmentation foundation models.
Our comprehensive transfer learning experiments confirm the suitability of foundation models in medical image segmentation and unveil the limitations of popular fine-tuning strategies in few-shot scenarios.
arXiv Detail & Related papers (2023-03-29T22:50:05Z) - Scalable Weight Reparametrization for Efficient Transfer Learning [10.265713480189486]
Efficient transfer learning involves utilizing a pre-trained model trained on a larger dataset and repurposing it for downstream tasks.
Previous works have led to an increase in updated parameters and task-specific modules, resulting in more computations, especially for tiny models.
We suggest learning a policy network that can decide where to reparametrize the pre-trained model, while adhering to a given constraint for the number of updated parameters.
arXiv Detail & Related papers (2023-02-26T23:19:11Z) - Parameter-Efficient Sparsity for Large Language Models Fine-Tuning [63.321205487234074]
We propose a.
sparse-efficient Sparse Training (PST) method to reduce the number of trainable parameters during sparse-aware training.
Experiments with diverse networks (i.e., BERT, RoBERTa and GPT-2) demonstrate PST performs on par or better than previous sparsity methods.
arXiv Detail & Related papers (2022-05-23T02:43:45Z) - Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.