Making DensePose fast and light
- URL: http://arxiv.org/abs/2006.15190v3
- Date: Thu, 9 Jul 2020 11:33:27 GMT
- Title: Making DensePose fast and light
- Authors: Ruslan Rakhimov, Emil Bogomolov, Alexandr Notchenko, Fung Mao, Alexey
Artemov, Denis Zorin, Evgeny Burnaev
- Abstract summary: Existing neural network models capable of solving this task are heavily parameterized.
To enable Dense Pose inference on the end device with current models, one needs to support an expensive server-side infrastructure and have a stable internet connection.
In this work, we target the problem of redesigning the DensePose R-CNN model's architecture so that the final network retains most of its accuracy but becomes more light-weight and fast.
- Score: 78.49552144907513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DensePose estimation task is a significant step forward for enhancing user
experience computer vision applications ranging from augmented reality to cloth
fitting. Existing neural network models capable of solving this task are
heavily parameterized and a long way from being transferred to an embedded or
mobile device. To enable Dense Pose inference on the end device with current
models, one needs to support an expensive server-side infrastructure and have a
stable internet connection. To make things worse, mobile and embedded devices
do not always have a powerful GPU inside. In this work, we target the problem
of redesigning the DensePose R-CNN model's architecture so that the final
network retains most of its accuracy but becomes more light-weight and fast. To
achieve that, we tested and incorporated many deep learning innovations from
recent years, specifically performing an ablation study on 23 efficient
backbone architectures, multiple two-stage detection pipeline modifications,
and custom model quantization methods. As a result, we achieved $17\times$
model size reduction and $2\times$ latency improvement compared to the baseline
model.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption.
Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible.
We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z) - Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Synaptic metaplasticity with multi-level memristive devices [1.5598974049838272]
We propose a memristor-based hardware solution for implementing metaplasticity during both inference and training.
We show that a two-layer perceptron achieves 97% and 86% accuracy on consecutive training of MNIST and Fashion-MNIST.
Our architecture is compatible with the memristor limited endurance and has a 15x reduction in memory.
arXiv Detail & Related papers (2023-06-21T09:40:25Z) - Efficient Deep Learning Methods for Identification of Defective Casting
Products [0.0]
In this paper, we have compared and contrasted various pre-trained and custom-built AI architectures.
Our results show that custom architectures are efficient than pre-trained mobile architectures.
Augmentation experimentations have also been carried out on the custom architectures to make the models more robust and generalizable.
arXiv Detail & Related papers (2022-05-14T19:35:05Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - The Untapped Potential of Off-the-Shelf Convolutional Neural Networks [29.205446247063673]
We show that existing off-the-shelf models like ResNet-50 are capable of over 95% accuracy on ImageNet.
This level of performance currently exceeds that of models with over 20x more parameters and significantly more complex training procedures.
arXiv Detail & Related papers (2021-03-17T20:04:46Z) - Toward Accurate Platform-Aware Performance Modeling for Deep Neural
Networks [0.17499351967216337]
We provide a machine learning-based method, PerfNetV2, which improves the accuracy of our previous work for modeling the neural network performance on a variety of GPU accelerators.
Given an application, the proposed method can be used to predict the inference time and training time of the convolutional neural networks used in the application.
Our case studies show that PerfNetV2 yields a mean absolute percentage error within 13.1% on LeNet, AlexNet, and VGG16 on NVIDIA GTX-1080Ti, while the error rate on a previous work published in ICBD 2018 could be as large as 200%.
arXiv Detail & Related papers (2020-12-01T01:42:23Z) - An Image Enhancing Pattern-based Sparsity for Real-time Inference on
Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.