Related papers: CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment

CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment

URL: http://arxiv.org/abs/2104.12642v1
Date: Mon, 26 Apr 2021 15:10:48 GMT
Title: CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment
Authors: Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov
Abstract summary: CompOFA constrains search to models close to the accuracy-latency frontier. We demonstrate that even with simple experiments we can achieve a 2x reduction in training time and 216x speedup in model search/extraction time.
Score: 1.433758865948252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The emergence of CNNs in mainstream deployment has necessitated methods to design and train efficient architectures tailored to maximize the accuracy under diverse hardware & latency constraints. To scale these resource-intensive tasks with an increasing number of deployment targets, Once-For-All (OFA) proposed an approach to jointly train several models at once with a constant training cost. However, this cost remains as high as 40-50 GPU days and also suffers from a combinatorial explosion of sub-optimal model configurations. We seek to reduce this search space -- and hence the training budget -- by constraining search to models close to the accuracy-latency Pareto frontier. We incorporate insights of compound relationships between model dimensions to build CompOFA, a design space smaller by several orders of magnitude. Through experiments on ImageNet, we demonstrate that even with simple heuristics we can achieve a 2x reduction in training time and 216x speedup in model search/extraction time compared to the state of the art, without loss of Pareto optimality! We also show that this smaller design space is dense enough to support equally accurate models for a similar diversity of hardware and latency targets, while also reducing the complexity of the training and subsequent extraction algorithms.

Related papers

Neural Architecture Codesign for Fast Physics Applications [0.8692847090818803]
We develop a pipeline to streamline neural architecture codesign for physics applications. We employ neural architecture search and network compression in a two-stage approach to discover hardware efficient models.
arXiv Detail & Related papers (2025-01-09T19:00:03Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance. We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches. We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z)
FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures. Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z)
Online Convolutional Re-parameterization [51.97831675242173]
We present online convolutional re- parameterization (OREPA), a two-stage pipeline, aiming to reduce the huge training overhead by squeezing the complex training-time block into a single convolution. Compared with the state-of-the-art re-param models, OREPA is able to save the training-time memory cost by about 70% and accelerate the training speed by around 2x. We also conduct experiments on object detection and semantic segmentation and show consistent improvements on the downstream tasks.
arXiv Detail & Related papers (2022-04-02T09:50:19Z)
Pruning In Time (PIT): A Lightweight Network Architecture Optimizer for Temporal Convolutional Networks [20.943095081056857]
Temporal Convolutional Networks (TCNs) are promising Deep Learning models for time-series processing tasks. We propose an automatic dilation, which tackles the problem as a weight pruning on the time-axis, and learns dilation factors together with weights, in a single training.
arXiv Detail & Related papers (2022-03-28T14:03:16Z)
Deep Generative Models that Solve PDEs: Distributed Computing for Training Large Data-Free Models [25.33147292369218]
Recent progress in scientific machine learning (SciML) has opened up the possibility of training novel neural network architectures that solve complex partial differential equations (PDEs) Here we report on a software framework for data parallel distributed deep learning that resolves the twin challenges of training these large SciML models. Our framework provides several out of the box functionality including (a) loss integrity independent of number of processes, (b) synchronized batch normalization, and (c) distributed higher-order optimization methods.
arXiv Detail & Related papers (2020-07-24T22:42:35Z)
Toward fast and accurate human pose estimation via soft-gated skip connections [97.06882200076096]
This paper is on highly accurate and highly efficient human pose estimation. We re-analyze this design choice in the context of improving both the accuracy and the efficiency over the state-of-the-art. Our model achieves state-of-the-art results on the MPII and LSP datasets.
arXiv Detail & Related papers (2020-02-25T18:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.