The Effect of Architecture During Continual Learning
- URL: http://arxiv.org/abs/2601.19766v1
- Date: Tue, 27 Jan 2026 16:29:42 GMT
- Title: The Effect of Architecture During Continual Learning
- Authors: Allyson Hahn, Krishnan Raghavan,
- Abstract summary: We introduce a mathematical framework that jointly models architecture and weights in a Sobolev space.<n>We prove that learning only model weights is insufficient to mitigate catastrophic forgetting under distribution shifts.<n> Empirical studies across regression and classification problems, including feedforward, convolutional, and graph neural networks, demonstrate that learning the optimal architecture and weights simultaneously yields substantially improved performance.
- Score: 2.1485350418225244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual learning is a challenge for models with static architecture, as they fail to adapt to when data distributions evolve across tasks. We introduce a mathematical framework that jointly models architecture and weights in a Sobolev space, enabling a rigorous investigation into the role of neural network architecture in continual learning and its effect on the forgetting loss. We derive necessary conditions for the continual learning solution and prove that learning only model weights is insufficient to mitigate catastrophic forgetting under distribution shifts. Consequently, we prove that by learning the architecture and weights simultaneously at each task, we can reduce catastrophic forgetting. To learn weights and architecture simultaneously, we formulate continual learning as a bilevel optimization problem: the upper level selects an optimal architecture for a given task, while the lower level computes optimal weights via dynamic programming over all tasks. To solve the upper level problem, we introduce a derivative-free direct search algorithm to determine the optimal architecture. Once found, we must transfer knowledge from the current architecture to the optimal one. However, the optimal architecture will result in a weights parameter space different from the current architecture (i.e., dimensions of weights matrices will not match). To bridge the dimensionality gap, we develop a low-rank transfer mechanism to map knowledge across architectures of mismatched dimensions. Empirical studies across regression and classification problems, including feedforward, convolutional, and graph neural networks, demonstrate that learning the optimal architecture and weights simultaneously yields substantially improved performance (up to two orders of magnitude), reduced forgetting, and enhanced robustness to noise compared with static architecture approaches.
Related papers
- Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally [2.421273972191206]
In machine learning tasks, one searches for an optimal function within a certain functional space.<n>This way forces the evolution of the function during training to lie within the realm of what is expressible with the chosen architecture.<n>We show that the information about desirable architectural changes, due to expressivity bottlenecks can be extracted from backpropagation.
arXiv Detail & Related papers (2024-05-30T08:23:56Z) - Unsupervised Graph Neural Architecture Search with Disentangled
Self-supervision [51.88848982611515]
Unsupervised graph neural architecture search remains unexplored in the literature.
We propose a novel Disentangled Self-supervised Graph Neural Architecture Search model.
Our model is able to achieve state-of-the-art performance against several baseline methods in an unsupervised manner.
arXiv Detail & Related papers (2024-03-08T05:23:55Z) - Differentiable Architecture Pruning for Transfer Learning [6.935731409563879]
We propose a gradient-based approach for extracting sub-architectures from a given large model.
Our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks.
We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
arXiv Detail & Related papers (2021-07-07T17:44:59Z) - AdaXpert: Adapting Neural Architecture for Growing Data [63.30393509048505]
In real-world applications, data often come in a growing manner, where the data volume and the number of classes may increase dynamically.
Given the increasing data volume or the number of classes, one has to instantaneously adjust the neural model capacity to obtain promising performance.
Existing methods either ignore the growing nature of data or seek to independently search an optimal architecture for a given dataset.
arXiv Detail & Related papers (2021-07-01T07:22:05Z) - iDARTS: Differentiable Architecture Search with Stochastic Implicit
Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS)
We tackle the hypergradient computation in DARTS based on the implicit function theorem.
We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z) - The Nonlinearity Coefficient -- A Practical Guide to Neural Architecture
Design [3.04585143845864]
We develop methods that can predict, without any training, whether an architecture will achieve a relatively high test or training error on a task after training.
We then go on to explain the error in terms of the architecture definition itself and develop tools for modifying the architecture.
Our first major contribution is to show that the 'degree of nonlinearity' of a neural architecture is a key causal driver behind its performance.
arXiv Detail & Related papers (2021-05-25T20:47:43Z) - Disentangling Neural Architectures and Weights: A Case Study in
Supervised Classification [8.976788958300766]
This work investigates the problem of disentangling the role of the neural structure and its edge weights.
We show that well-trained architectures may not need any link-specific fine-tuning of the weights.
We use a novel and computationally efficient method that translates the hard architecture-search problem into a feasible optimization problem.
arXiv Detail & Related papers (2020-09-11T11:22:22Z) - Adversarially Robust Neural Architectures [43.74185132684662]
This paper aims to improve the adversarial robustness of the network from the architecture perspective with NAS framework.
We explore the relationship among adversarial robustness, Lipschitz constant, and architecture parameters.
Our algorithm empirically achieves the best performance among all the models under various attacks on different datasets.
arXiv Detail & Related papers (2020-09-02T08:52:15Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Disturbance-immune Weight Sharing for Neural Architecture Search [96.93812980299428]
We propose a disturbance-immune update strategy for model updating.
We theoretically analyze the effectiveness of our strategy in alleviating the performance disturbance risk.
arXiv Detail & Related papers (2020-03-29T17:54:49Z) - RC-DARTS: Resource Constrained Differentiable Architecture Search [162.7199952019152]
We propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster.
We show that the RC-DARTS method learns lightweight neural architectures which have smaller model size and lower computational complexity.
arXiv Detail & Related papers (2019-12-30T05:02:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.