Transfer Learning via Test-Time Neural Networks Aggregation
- URL: http://arxiv.org/abs/2206.13399v1
- Date: Mon, 27 Jun 2022 15:46:05 GMT
- Title: Transfer Learning via Test-Time Neural Networks Aggregation
- Authors: Bruno Casella, Alessio Barbaro Chisari, Sebastiano Battiato, Mario
Valerio Giuffrida
- Abstract summary: It has been demonstrated that deep neural networks outperform traditional machine learning.
Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
- Score: 11.42582922543676
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has been demonstrated that deep neural networks outperform traditional
machine learning. However, deep networks lack generalisability, that is, they
will not perform as good as in a new (testing) set drawn from a different
distribution due to the domain shift. In order to tackle this known issue,
several transfer learning approaches have been proposed, where the knowledge of
a trained model is transferred into another to improve performance with
different data. However, most of these approaches require additional training
steps, or they suffer from catastrophic forgetting that occurs when a trained
model has overwritten previously learnt knowledge. We address both problems
with a novel transfer learning approach that uses network aggregation. We train
dataset-specific networks together with an aggregation network in a unified
framework. The loss function includes two main components: a task-specific loss
(such as cross-entropy) and an aggregation loss. The proposed aggregation loss
allows our model to learn how trained deep network parameters can be aggregated
with an aggregation operator. We demonstrate that the proposed approach learns
model aggregation at test time without any further training step, reducing the
burden of transfer learning to a simple arithmetical operation. The proposed
approach achieves comparable performance w.r.t. the baseline. Besides, if the
aggregation operator has an inverse, we will show that our model also
inherently allows for selective forgetting, i.e., the aggregated model can
forget one of the datasets it was trained on, retaining information on the
others.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Transfer Learning with Reconstruction Loss [12.906500431427716]
This paper proposes a novel approach for model training by adding into the model an additional reconstruction stage associated with a new reconstruction loss.
The proposed approach encourages the learned features to be general and transferable, and therefore can be readily used for efficient transfer learning.
For numerical simulations, three applications are studied: transfer learning on classifying MNIST handwritten digits, the device-to-device wireless network power allocation, and the multiple-input-single-output network downlink beamforming and localization.
arXiv Detail & Related papers (2024-03-31T00:22:36Z) - Adapt & Align: Continual Learning with Generative Models Latent Space
Alignment [15.729732755625474]
We introduce Adapt & Align, a method for continual learning of neural networks by aligning latent representations in generative models.
Neural Networks suffer from abrupt loss in performance when retrained with additional data.
We propose a new method that mitigates those problems by employing generative models and splitting the process of their update into two parts.
arXiv Detail & Related papers (2023-12-21T10:02:17Z) - Negotiated Representations to Prevent Forgetting in Machine Learning
Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning.
We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Transfer Learning for Node Regression Applied to Spreading Prediction [0.0]
We explore the utility of the state-of-the-art node representation learners when used to assess the effects of spreading from a given node.
As many real-life networks are topologically similar, we systematically investigate whether the learned models generalize to previously unseen networks.
This is one of the first attempts to evaluate the utility of zero-shot transfer for the task of node regression.
arXiv Detail & Related papers (2021-03-31T20:09:09Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network.
Our motivation is that models are expected to understand complex dynamics from different sources.
Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z) - Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network.
PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z) - Adversarial Incremental Learning [0.0]
Deep learning can forget previously learned information upon learning new tasks where previous data is not available.
We propose an adversarial discriminator based method that does not make use of old data at all while training on new tasks.
We are able to outperform other state-of-the-art methods on CIFAR-100, SVHN, and MNIST datasets.
arXiv Detail & Related papers (2020-01-30T02:25:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.