Related papers: Differentiable Architecture Pruning for Transfer Learning

Differentiable Architecture Pruning for Transfer Learning

URL: http://arxiv.org/abs/2107.03375v1
Date: Wed, 7 Jul 2021 17:44:59 GMT
Title: Differentiable Architecture Pruning for Transfer Learning
Authors: Nicolo Colombo and Yang Gao
Abstract summary: We propose a gradient-based approach for extracting sub-architectures from a given large model. Our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
Score: 6.935731409563879
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup where architectures can be trained on a large data set but very few data points are available for fine-tuning them on new tasks. We define a new gradient-based algorithm that trains architectures of arbitrarily low complexity independently from the attached weights. Given a search space defined by an existing large neural model, we reformulate the architecture search task as a complexity-penalized subset-selection problem and solve it through a two-temperature relaxation scheme. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.

Related papers

Spectral Architecture Search for Neural Networks [0.0]
We present a novel architecture search protocol which exploits the spectral attributes of the inter-layer transfer matrices. We show that the newly proposed method yields a self-emerging architecture with a minimal degree of expressivity to handle the task under investigation.
arXiv Detail & Related papers (2025-04-01T15:14:30Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Heterogeneous Continual Learning [88.53038822561197]
We propose a novel framework to tackle the continual learning (CL) problem with changing network architectures. We build on top of the distillation family of techniques and modify it to a new setting where a weaker model takes the role of a teacher. We also propose Quick Deep Inversion (QDI) to recover prior task visual features to support knowledge transfer.
arXiv Detail & Related papers (2023-06-14T15:54:42Z)
Equivariant Architectures for Learning in Deep Weight Spaces [54.61765488960555]
We present a novel network architecture for learning in deep weight spaces. It takes as input a concatenation of weights and biases of a pre-trainedvariant. We show how these layers can be implemented using three basic operations.
arXiv Detail & Related papers (2023-01-30T10:50:33Z)
Conceptual Expansion Neural Architecture Search (CENAS) [1.3464152928754485]
We present an approach called Conceptual Expansion Neural Architecture Search (CENAS) It combines a sample-efficient, computational creativity-inspired transfer learning approach with neural architecture search. It finds models faster than naive architecture search via transferring existing weights to approximate the parameters of the new model.
arXiv Detail & Related papers (2021-10-07T02:29:26Z)
SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task. The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders. To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z)
Elastic Architecture Search for Diverse Tasks with Different Resources [87.23061200971912]
We study a new challenging problem of efficient deployment for diverse tasks with different resources, where the resource constraint and task of interest corresponding to a group of classes are dynamically specified at testing time. Previous NAS approaches seek to design architectures for all classes simultaneously, which may not be optimal for some individual tasks. We present a novel and general framework, called Elastic Architecture Search (EAS), permitting instant specializations at runtime for diverse tasks with various resource constraints.
arXiv Detail & Related papers (2021-08-03T00:54:27Z)
Disentangling Neural Architectures and Weights: A Case Study in Supervised Classification [8.976788958300766]
This work investigates the problem of disentangling the role of the neural structure and its edge weights. We show that well-trained architectures may not need any link-specific fine-tuning of the weights. We use a novel and computationally efficient method that translates the hard architecture-search problem into a feasible optimization problem.
arXiv Detail & Related papers (2020-09-11T11:22:22Z)
Adversarially Robust Neural Architectures [43.74185132684662]
This paper aims to improve the adversarial robustness of the network from the architecture perspective with NAS framework. We explore the relationship among adversarial robustness, Lipschitz constant, and architecture parameters. Our algorithm empirically achieves the best performance among all the models under various attacks on different datasets.
arXiv Detail & Related papers (2020-09-02T08:52:15Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
RC-DARTS: Resource Constrained Differentiable Architecture Search [162.7199952019152]
We propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster. We show that the RC-DARTS method learns lightweight neural architectures which have smaller model size and lower computational complexity.
arXiv Detail & Related papers (2019-12-30T05:02:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.