Rethinking Deconvolution for 2D Human Pose Estimation Light yet Accurate
Model for Real-time Edge Computing
- URL: http://arxiv.org/abs/2111.04226v1
- Date: Mon, 8 Nov 2021 01:44:46 GMT
- Title: Rethinking Deconvolution for 2D Human Pose Estimation Light yet Accurate
Model for Real-time Edge Computing
- Authors: Masayuki Yamazaki, Eigo Mori
- Abstract summary: This system was found to be very accurate and achieved a 94.5% accuracy of SOTA HRNet 256x192.
Our model adopts an encoder-decoder architecture and is carefully downsized to improve its efficiency.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we present a pragmatic lightweight pose estimation model. Our
model can achieve real-time predictions using low-power embedded devices. This
system was found to be very accurate and achieved a 94.5% accuracy of SOTA
HRNet 256x192 using a computational cost of only 3.8% on COCO test dataset. Our
model adopts an encoder-decoder architecture and is carefully downsized to
improve its efficiency. We especially focused on optimizing the deconvolution
layers and observed that the channel reduction of the deconvolution layers
contributes significantly to reducing computational resource consumption
without degrading the accuracy of this system. We also incorporated recent
model agnostic techniques such as DarkPose and distillation training to
maximize the efficiency of our model. Furthermore, we applied model
quantization to exploit multi/mixed precision features. Our FP16'ed model (COCO
AP 70.0) operates at ~60-fps on NVIDIA Jetson AGX Xavier and ~200 fps on NVIDIA
Quadro RTX6000.
Related papers
- Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning [17.097742540845672]
"Trimming the fat" is a post-hoc gradient-informed iterative pruning technique to eliminate redundant information encoded in the model.
Our approach achieves around 50$times$ compression while preserving performance similar to the baseline model, and is able to speed-up computation up to 600 FPS.
arXiv Detail & Related papers (2024-06-26T09:57:55Z) - Network architecture search of X-ray based scientific applications [4.8287663496299755]
X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution.
Recently, the use of deep neural networks has improved the existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-04-16T16:09:38Z) - Post-training Model Quantization Using GANs for Synthetic Data
Generation [57.40733249681334]
We investigate the use of synthetic data as a substitute for the calibration with real data for the quantization method.
We compare the performance of models quantized using data generated by StyleGAN2-ADA and our pre-trained DiStyleGAN, with quantization using real data and an alternative data generation method based on fractal images.
arXiv Detail & Related papers (2023-05-10T11:10:09Z) - EdgeYOLO: An Edge-Real-Time Object Detector [69.41688769991482]
This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework.
We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects.
Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS 2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone 2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia
arXiv Detail & Related papers (2023-02-15T06:05:14Z) - SmoothNets: Optimizing CNN architecture design for differentially
private deep learning [69.10072367807095]
DPSGD requires clipping and noising of per-sample gradients.
This introduces a reduction in model utility compared to non-private training.
We distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training.
arXiv Detail & Related papers (2022-05-09T07:51:54Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Towards Practical Lipreading with Distilled and Efficient Models [57.41253104365274]
Lipreading has witnessed a lot of progress due to the resurgence of neural networks.
Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization.
There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios.
We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
arXiv Detail & Related papers (2020-07-13T16:56:27Z) - Generative Multi-Stream Architecture For American Sign Language
Recognition [15.717424753251674]
Training on datasets with low feature-richness for complex applications limit optimal convergence below human performance.
We propose a generative multistream architecture, eliminating the need for additional hardware with the intent to improve feature convergence without risking impracticability.
Our methods have achieved 95.62% validation accuracy with a variance of 1.42% from training, outperforming past models by 0.45% in validation accuracy and 5.53% in variance.
arXiv Detail & Related papers (2020-03-09T21:04:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.