Related papers: Transfer Learning via Test-Time Neural Networks Aggregation

Transfer Learning via Test-Time Neural Networks Aggregation

URL: http://arxiv.org/abs/2206.13399v1
Date: Mon, 27 Jun 2022 15:46:05 GMT
Title: Transfer Learning via Test-Time Neural Networks Aggregation
Authors: Bruno Casella, Alessio Barbaro Chisari, Sebastiano Battiato, Mario Valerio Giuffrida
Abstract summary: It has been demonstrated that deep neural networks outperform traditional machine learning. Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
Score: 11.42582922543676
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It has been demonstrated that deep neural networks outperform traditional machine learning. However, deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution due to the domain shift. In order to tackle this known issue, several transfer learning approaches have been proposed, where the knowledge of a trained model is transferred into another to improve performance with different data. However, most of these approaches require additional training steps, or they suffer from catastrophic forgetting that occurs when a trained model has overwritten previously learnt knowledge. We address both problems with a novel transfer learning approach that uses network aggregation. We train dataset-specific networks together with an aggregation network in a unified framework. The loss function includes two main components: a task-specific loss (such as cross-entropy) and an aggregation loss. The proposed aggregation loss allows our model to learn how trained deep network parameters can be aggregated with an aggregation operator. We demonstrate that the proposed approach learns model aggregation at test time without any further training step, reducing the burden of transfer learning to a simple arithmetical operation. The proposed approach achieves comparable performance w.r.t. the baseline. Besides, if the aggregation operator has an inverse, we will show that our model also inherently allows for selective forgetting, i.e., the aggregated model can forget one of the datasets it was trained on, retaining information on the others.

Related papers

Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
Transfer Learning with Reconstruction Loss [12.906500431427716]
This paper proposes a novel approach for model training by adding into the model an additional reconstruction stage associated with a new reconstruction loss. The proposed approach encourages the learned features to be general and transferable, and therefore can be readily used for efficient transfer learning. For numerical simulations, three applications are studied: transfer learning on classifying MNIST handwritten digits, the device-to-device wireless network power allocation, and the multiple-input-single-output network downlink beamforming and localization.
arXiv Detail & Related papers (2024-03-31T00:22:36Z)
Adapt & Align: Continual Learning with Generative Models Latent Space Alignment [15.729732755625474]
We introduce Adapt & Align, a method for continual learning of neural networks by aligning latent representations in generative models. Neural Networks suffer from abrupt loss in performance when retrained with additional data. We propose a new method that mitigates those problems by employing generative models and splitting the process of their update into two parts.
arXiv Detail & Related papers (2023-12-21T10:02:17Z)
Negotiated Representations to Prevent Forgetting in Machine Learning Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning. We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z)
On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task. Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z)
Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems. We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z)
Transfer Learning for Node Regression Applied to Spreading Prediction [0.0]
We explore the utility of the state-of-the-art node representation learners when used to assess the effects of spreading from a given node. As many real-life networks are topologically similar, we systematically investigate whether the learned models generalize to previously unseen networks. This is one of the first attempts to evaluate the utility of zero-shot transfer for the task of node regression.
arXiv Detail & Related papers (2021-03-31T20:09:09Z)
Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning. We propose a novel method of using data augmentations when training autoencoders. We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z)
Unsupervised Transfer Learning for Spatiotemporal Predictive Networks [90.67309545798224]
We study how to transfer knowledge from a zoo of unsupervisedly learned models towards another network. Our motivation is that models are expected to understand complex dynamics from different sources. Our approach yields significant improvements on three benchmarks fortemporal prediction, and benefits the target even from less relevant ones.
arXiv Detail & Related papers (2020-09-24T15:40:55Z)
Pre-Trained Models for Heterogeneous Information Networks [57.78194356302626]
We propose a self-supervised pre-training and fine-tuning framework, PF-HIN, to capture the features of a heterogeneous information network. PF-HIN consistently and significantly outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
arXiv Detail & Related papers (2020-07-07T03:36:28Z)
Adversarial Incremental Learning [0.0]
Deep learning can forget previously learned information upon learning new tasks where previous data is not available. We propose an adversarial discriminator based method that does not make use of old data at all while training on new tasks. We are able to outperform other state-of-the-art methods on CIFAR-100, SVHN, and MNIST datasets.
arXiv Detail & Related papers (2020-01-30T02:25:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.