HyperInterval: Hypernetwork approach to training weight interval regions in continual learning
- URL: http://arxiv.org/abs/2405.15444v3
- Date: Mon, 2 Sep 2024 15:09:05 GMT
- Title: HyperInterval: Hypernetwork approach to training weight interval regions in continual learning
- Authors: Patryk Krukowski, Anna Bielawska, Kamil Książek, Paweł Wawrzyński, Paweł Batorski, Przemysław Spurek,
- Abstract summary: Interval Continual Learning (InterContiNet) relies on enforcing interval constraints on the neural network parameter space.
We introduce our, a technique that employs interval arithmetic within the embedding space.
our obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce \our{} \footnote{The source code is available at https://github.com/gmum/HyperInterval}, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, \our{} maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and, finally, we can utilize one set of weights. \our{} obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.
Related papers
- HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories [62.975803165786324]
We propose a method to train hypernetworks, without the need for any per-sample ground truth.
Our key idea is to learn a Hypernetwork Field and estimate the entire trajectory of network weight training instead of simply its converged state.
arXiv Detail & Related papers (2024-12-22T14:37:10Z) - HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning [0.0]
We propose a method called HyperMask, which dynamically filters a target network depending on the CL task.
Due to the lottery ticket hypothesis, we can use a single network with weighted forgettings.
arXiv Detail & Related papers (2023-09-29T20:01:11Z) - Dense Network Expansion for Class Incremental Learning [61.00081795200547]
State-of-the-art approaches use a dynamic architecture based on network expansion (NE), in which a task expert is added per task.
A new NE method, dense network expansion (DNE), is proposed to achieve a better trade-off between accuracy and model complexity.
It outperforms the previous SOTA methods by a margin of 4% in terms of accuracy, with similar or even smaller model scale.
arXiv Detail & Related papers (2023-03-22T16:42:26Z) - Low Rank Optimization for Efficient Deep Learning: Making A Balance
between Compact Architecture and Fast Training [36.85333789033387]
In this paper, we focus on low-rank optimization for efficient deep learning techniques.
In the space domain, deep neural networks are compressed by low rank approximation of the network parameters.
In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence.
arXiv Detail & Related papers (2023-03-22T03:55:16Z) - Continual Learning with Dependency Preserving Hypernetworks [14.102057320661427]
An effective approach to address continual learning (CL) problems is to use hypernetworks which generate task dependent weights for a target network.
We propose a novel approach that uses a dependency preserving hypernetwork to generate weights for the target network while also maintaining the parameter efficiency.
In addition, we propose novel regularisation and network growth techniques for the RNN based hypernetwork to further improve the continual learning performance.
arXiv Detail & Related papers (2022-09-16T04:42:21Z) - Continual Learning with Guarantees via Weight Interval Constraints [18.791232422083265]
We introduce a new training paradigm that enforces interval constraints on neural network parameter space to control forgetting.
We show how to put bounds on forgetting by reformulating continual learning of a model as a continual contraction of its parameter space.
arXiv Detail & Related papers (2022-06-16T08:28:37Z) - Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask
Training [55.43088293183165]
Recent studies show that pre-trained language models (PLMs) like BERT contain matchingworks that have similar transfer learning performance as the original PLM.
In this paper, we find that the BERTworks have even more potential than these studies have shown.
We train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork.
arXiv Detail & Related papers (2022-04-24T08:42:47Z) - FreeTickets: Accurate, Robust and Efficient Deep Ensemble by Training
with Dynamic Sparsity [74.58777701536668]
We introduce the FreeTickets concept, which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin.
We propose two novel efficient ensemble methods with dynamic sparsity, which yield in one shot many diverse and accurate tickets "for free" during the sparse training process.
arXiv Detail & Related papers (2021-06-28T10:48:20Z) - SALA: Soft Assignment Local Aggregation for Parameter Efficient 3D
Semantic Segmentation [65.96170587706148]
We focus on designing a point local aggregation function that yields parameter efficient networks for 3D point cloud semantic segmentation.
We explore the idea of using learnable neighbor-to-grid soft assignment in grid-based aggregation functions.
arXiv Detail & Related papers (2020-12-29T20:16:37Z) - Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive
Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning.
We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset.
We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.