MutualNet: Adaptive ConvNet via Mutual Learning from Different Model
Configurations
- URL: http://arxiv.org/abs/2105.07085v1
- Date: Fri, 14 May 2021 22:30:13 GMT
- Title: MutualNet: Adaptive ConvNet via Mutual Learning from Different Model
Configurations
- Authors: Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar
Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen
- Abstract summary: We propose MutualNet to train a single network that can run at a diverse set of resource constraints.
Our method trains a cohort of model configurations with various network widths and input resolutions.
MutualNet is a general training methodology that can be applied to various network structures.
- Score: 51.85020143716815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing deep neural networks are static, which means they can only do
inference at a fixed complexity. But the resource budget can vary substantially
across different devices. Even on a single device, the affordable budget can
change with different scenarios, and repeatedly training networks for each
required budget would be incredibly expensive. Therefore, in this work, we
propose a general method called MutualNet to train a single network that can
run at a diverse set of resource constraints. Our method trains a cohort of
model configurations with various network widths and input resolutions. This
mutual learning scheme not only allows the model to run at different
width-resolution configurations but also transfers the unique knowledge among
these configurations, helping the model to learn stronger representations
overall. MutualNet is a general training methodology that can be applied to
various network structures (e.g., 2D networks: MobileNets, ResNet, 3D networks:
SlowFast, X3D) and various tasks (e.g., image classification, object detection,
segmentation, and action recognition), and is demonstrated to achieve
consistent improvements on a variety of datasets. Since we only train the model
once, it also greatly reduces the training cost compared to independently
training several models. Surprisingly, MutualNet can also be used to
significantly boost the performance of a single network, if dynamic resource
constraint is not a concern. In summary, MutualNet is a unified method for both
static and adaptive, 2D and 3D networks. Codes and pre-trained models are
available at \url{https://github.com/taoyang1122/MutualNet}.
Related papers
- Network Fission Ensembles for Low-Cost Self-Ensembles [20.103367702014474]
We propose a low-cost ensemble learning and inference, called Network Fission Ensembles (NFE)
We first prune some of the weights to reduce the training burden.
We then group the remaining weights into several sets and create multiple auxiliary paths using each set to construct multi-exits.
arXiv Detail & Related papers (2024-08-05T08:23:59Z) - Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption.
Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible.
We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z) - Cooperative Learning for Cost-Adaptive Inference [3.301728339780329]
The proposed framework is not tied to any specific architecture but can incorporate any existing models/architectures.
It provides comparable accuracy to its full network while various sizes of models are available.
arXiv Detail & Related papers (2023-12-13T21:42:27Z) - SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks [30.069353400127046]
We propose SortedNet to harness the inherent modularity of deep neural networks (DNNs)
SortedNet enables the training of sub-models simultaneously along with the training of the main model.
It is able to train 160 sub-models at once, achieving at least 96% of the original model's performance.
arXiv Detail & Related papers (2023-09-01T05:12:25Z) - On Optimizing the Communication of Model Parallelism [74.15423270435949]
We study a novel and important communication pattern in large-scale model-parallel deep learning (DL)
In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to a destination device mesh.
We propose two contributions to address cross-mesh resharding: an efficient broadcast-based communication system, and an "overlapping-friendly" pipeline schedule.
arXiv Detail & Related papers (2022-11-10T03:56:48Z) - Simultaneous Training of Partially Masked Neural Networks [67.19481956584465]
We show that it is possible to train neural networks in such a way that a predefined 'core' subnetwork can be split-off from the trained full network with remarkable good performance.
We show that training a Transformer with a low-rank core gives a low-rank model with superior performance than when training the low-rank model alone.
arXiv Detail & Related papers (2021-06-16T15:57:51Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Multi-channel U-Net for Music Source Separation [3.814858728853163]
Conditioned U-Net (C-U-Net) uses a control mechanism to train a single model for multi-source separation.
We propose a multi-channel U-Net (M-U-Net) trained using a weighted multi-task loss.
arXiv Detail & Related papers (2020-03-23T17:42:35Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.