Adapt & Align: Continual Learning with Generative Models Latent Space
Alignment
- URL: http://arxiv.org/abs/2312.13699v1
- Date: Thu, 21 Dec 2023 10:02:17 GMT
- Title: Adapt & Align: Continual Learning with Generative Models Latent Space
Alignment
- Authors: Kamil Deja, Bartosz Cywi\'nski, Jan Rybarczyk, Tomasz Trzci\'nski
- Abstract summary: We introduce Adapt & Align, a method for continual learning of neural networks by aligning latent representations in generative models.
Neural Networks suffer from abrupt loss in performance when retrained with additional data.
We propose a new method that mitigates those problems by employing generative models and splitting the process of their update into two parts.
- Score: 15.729732755625474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce Adapt & Align, a method for continual learning of
neural networks by aligning latent representations in generative models. Neural
Networks suffer from abrupt loss in performance when retrained with additional
training data from different distributions. At the same time, training with
additional data without access to the previous examples rarely improves the
model's performance. In this work, we propose a new method that mitigates those
problems by employing generative models and splitting the process of their
update into two parts. In the first one, we train a local generative model
using only data from a new task. In the second phase, we consolidate latent
representations from the local model with a global one that encodes knowledge
of all past experiences. We introduce our approach with Variational
Auteoncoders and Generative Adversarial Networks. Moreover, we show how we can
use those generative models as a general method for continual knowledge
consolidation that can be used in downstream tasks such as classification.
Related papers
- Joint Diffusion models in Continual Learning [4.013156524547073]
We introduce JDCL - a new method for continual learning with generative rehearsal based on joint diffusion models.
Generative-replay-based continual learning methods try to mitigate this issue by retraining a model with a combination of new and rehearsal data sampled from a generative model.
We show that such shared parametrization, combined with the knowledge distillation technique allows for stable adaptation to new tasks without catastrophic forgetting.
arXiv Detail & Related papers (2024-11-12T22:35:44Z) - Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - Cooperative data-driven modeling [44.99833362998488]
Data-driven modeling in mechanics is evolving rapidly based on recent machine learning advances.
New data and models created by different groups become available, opening possibilities for cooperative modeling.
Artificial neural networks suffer from catastrophic forgetting, i.e. they forget how to perform an old task when trained on a new one.
This hinders cooperation because adapting an existing model for a new task affects the performance on a previous task trained by someone else.
arXiv Detail & Related papers (2022-11-23T14:27:25Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning.
Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z) - GAN Cocktail: mixing GANs without dataset access [18.664733153082146]
We tackle the problem of model merging, given two constraints that often come up in the real world.
In the first stage, we transform the weights of all the models to the same parameter space by a technique we term model rooting.
In the second stage, we merge the rooted models by averaging their weights and fine-tuning them for each specific domain, using only data generated by the original trained models.
arXiv Detail & Related papers (2021-06-07T17:59:04Z) - Streaming Graph Neural Networks via Continual Learning [31.810308087441445]
Graph neural networks (GNNs) have achieved strong performance in various applications.
In this paper, we propose a streaming GNN model based on continual learning.
We show that our model can efficiently update model parameters and achieve comparable performance to model retraining.
arXiv Detail & Related papers (2020-09-23T06:52:30Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.