Cooperative Learning for Cost-Adaptive Inference
- URL: http://arxiv.org/abs/2312.08532v2
- Date: Tue, 26 Dec 2023 20:26:54 GMT
- Title: Cooperative Learning for Cost-Adaptive Inference
- Authors: Xingli Fang, Richard Bradford, Jung-Eun Kim
- Abstract summary: The proposed framework is not tied to any specific architecture but can incorporate any existing models/architectures.
It provides comparable accuracy to its full network while various sizes of models are available.
- Score: 3.301728339780329
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a cooperative training framework for deep neural network
architectures that enables the runtime network depths to change to satisfy
dynamic computing resource requirements. In our framework, the number of layers
participating in computation can be chosen dynamically to meet performance-cost
trade-offs at inference runtime. Our method trains two Teammate nets and a
Leader net, and two sets of Teammate sub-networks with various depths through
knowledge distillation. The Teammate nets derive sub-networks and transfer
knowledge to them, and to each other, while the Leader net guides Teammate nets
to ensure accuracy. The approach trains the framework atomically at once
instead of individually training various sizes of models; in a sense, the
various-sized networks are all trained at once, in a "package deal." The
proposed framework is not tied to any specific architecture but can incorporate
any existing models/architectures, therefore it can maintain stable results and
is insensitive to the size of a dataset's feature map. Compared with other
related approaches, it provides comparable accuracy to its full network while
various sizes of models are available.
Related papers
- OFA$^2$: A Multi-Objective Perspective for the Once-for-All Neural
Architecture Search [79.36688444492405]
Once-for-All (OFA) is a Neural Architecture Search (NAS) framework designed to address the problem of searching efficient architectures for devices with different resources constraints.
We aim to give one step further in the search for efficiency by explicitly conceiving the search stage as a multi-objective optimization problem.
arXiv Detail & Related papers (2023-03-23T21:30:29Z) - Supernet Training for Federated Image Classification under System
Heterogeneity [15.2292571922932]
In this work, we propose a novel framework to consider both scenarios, namely Federation of Supernet Training (FedSup)
It is inspired by how averaging parameters in the model aggregation stage of Federated Learning (FL) is similar to weight-sharing in supernet training.
Under our framework, we present an efficient algorithm (E-FedSup) by sending the sub-model to clients in the broadcast stage for reducing communication costs and training overhead.
arXiv Detail & Related papers (2022-06-03T02:21:01Z) - Unsupervised Domain-adaptive Hash for Networks [81.49184987430333]
Domain-adaptive hash learning has enjoyed considerable success in the computer vision community.
We develop an unsupervised domain-adaptive hash learning method for networks, dubbed UDAH.
arXiv Detail & Related papers (2021-08-20T12:09:38Z) - Distributed Learning for Time-varying Networks: A Scalable Design [13.657740129012804]
We propose a distributed learning framework based on a scalable deep neural network (DNN) design.
By exploiting the permutation equivalence and invariance properties of the learning tasks, the DNNs with different scales for different clients can be built up.
Model aggregation can also be conducted based on these two sub-matrices to improve the learning convergence and performance.
arXiv Detail & Related papers (2021-07-31T12:44:28Z) - Differentiable Architecture Pruning for Transfer Learning [6.935731409563879]
We propose a gradient-based approach for extracting sub-architectures from a given large model.
Our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks.
We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
arXiv Detail & Related papers (2021-07-07T17:44:59Z) - MutualNet: Adaptive ConvNet via Mutual Learning from Different Model
Configurations [51.85020143716815]
We propose MutualNet to train a single network that can run at a diverse set of resource constraints.
Our method trains a cohort of model configurations with various network widths and input resolutions.
MutualNet is a general training methodology that can be applied to various network structures.
arXiv Detail & Related papers (2021-05-14T22:30:13Z) - Embedded Knowledge Distillation in Depth-level Dynamic Neural Network [8.207403859762044]
We propose an elegant Depth-level Dynamic Neural Network (DDNN) integrated different-depth sub-nets of similar architectures.
In this article, we design the Embedded-Knowledge-Distillation (EKD) training mechanism for the DDNN to implement semantic knowledge transfer from the teacher (full) net to multiple sub-nets.
Experiments on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that sub-nets in DDNN with EKD training achieves better performance than the depth-level pruning or individually training.
arXiv Detail & Related papers (2021-03-01T06:35:31Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training.
The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.