Increasing Depth of Neural Networks for Life-long Learning
- URL: http://arxiv.org/abs/2202.10821v2
- Date: Mon, 8 May 2023 16:17:40 GMT
- Title: Increasing Depth of Neural Networks for Life-long Learning
- Authors: J\k{e}drzej Kozal, Micha{\l} Wo\'zniak
- Abstract summary: We propose a novel method for continual learning based on the increasing depth of neural networks.
This work explores whether extending neural network depth may be beneficial in a life-long learning setting.
- Score: 2.0305676256390934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Purpose: We propose a novel method for continual learning based on the
increasing depth of neural networks. This work explores whether extending
neural network depth may be beneficial in a life-long learning setting.
Methods: We propose a novel approach based on adding new layers on top of
existing ones to enable the forward transfer of knowledge and adapting
previously learned representations. We employ a method of determining the most
similar tasks for selecting the best location in our network to add new nodes
with trainable parameters. This approach allows for creating a tree-like model,
where each node is a set of neural network parameters dedicated to a specific
task. The Progressive Neural Network concept inspires the proposed method.
Therefore, it benefits from dynamic changes in network structure. However,
Progressive Neural Network allocates a lot of memory for the whole network
structure during the learning process. The proposed method alleviates this by
adding only part of a network for a new task and utilizing a subset of
previously trained weights. At the same time, we may retain the benefit of PNN,
such as no forgetting guaranteed by design, without needing a memory buffer.
Results: Experiments on Split CIFAR and Split Tiny ImageNet show that the
proposed algorithm is on par with other continual learning methods. In a more
challenging setup with a single computer vision dataset as a separate task, our
method outperforms Experience Replay.
Conclusion: It is compatible with commonly used computer vision architectures
and does not require a custom network structure. As an adaptation to changing
data distribution is made by expanding the architecture, there is no need to
utilize a rehearsal buffer. For this reason, our method could be used for
sensitive applications where data privacy must be considered.
Related papers
- Simultaneous Weight and Architecture Optimization for Neural Networks [6.2241272327831485]
We introduce a novel neural network training framework that transforms the process by learning architecture and parameters simultaneously with gradient descent.
Central to our approach is a multi-scale encoder-decoder, in which the encoder embeds pairs of neural networks with similar functionalities close to each other.
Experiments demonstrate that our framework can discover sparse and compact neural networks maintaining a high performance.
arXiv Detail & Related papers (2024-10-10T19:57:36Z) - Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them [0.0]
Traditional approaches to neuroevolution often start from scratch.
Recombining trained networks is non-trivial because architectures and feature representations typically differ.
We employ stitching, which merges the networks by introducing new layers at crossover points.
arXiv Detail & Related papers (2024-03-21T08:30:44Z) - Negotiated Representations to Prevent Forgetting in Machine Learning
Applications [0.0]
Catastrophic forgetting is a significant challenge in the field of machine learning.
We propose a novel method for preventing catastrophic forgetting in machine learning applications.
arXiv Detail & Related papers (2023-11-30T22:43:50Z) - An Initialization Schema for Neuronal Networks on Tabular Data [0.9155684383461983]
We show that a binomial neural network can be used effectively on tabular data.
The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks.
We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches.
arXiv Detail & Related papers (2023-11-07T13:52:35Z) - StitchNet: Composing Neural Networks from Pre-Trained Fragments [3.638431342539701]
We propose StitchNet, a novel neural network creation paradigm.
It stitches together fragments from multiple pre-trained neural networks.
We show that these fragments can be stitched together to create neural networks with accuracy comparable to that of traditionally trained networks.
arXiv Detail & Related papers (2023-01-05T08:02:30Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - Side-Tuning: A Baseline for Network Adaptation via Additive Side
Networks [95.51368472949308]
Adaptation can be useful in cases when training data is scarce, or when one wishes to encode priors in the network.
In this paper, we propose a straightforward alternative: side-tuning.
arXiv Detail & Related papers (2019-12-31T18:52:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.