SSP-Net: Scalable Sequential Pyramid Networks for Real-Time 3D Human
Pose Regression
- URL: http://arxiv.org/abs/2009.01998v1
- Date: Fri, 4 Sep 2020 03:43:24 GMT
- Title: SSP-Net: Scalable Sequential Pyramid Networks for Real-Time 3D Human
Pose Regression
- Authors: Diogo Luvizon and Hedi Tabia and David Picard
- Abstract summary: We propose a highly scalable convolutional neural network, end-to-end trainable, for real-time 3D human pose regression from still RGB images.
Our network requires a single training procedure and is capable of producing its best predictions at 120 frames per second.
- Score: 27.85790535227085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we propose a highly scalable convolutional neural network,
end-to-end trainable, for real-time 3D human pose regression from still RGB
images. We call this approach the Scalable Sequential Pyramid Networks
(SSP-Net) as it is trained with refined supervision at multiple scales in a
sequential manner. Our network requires a single training procedure and is
capable of producing its best predictions at 120 frames per second (FPS), or
acceptable predictions at more than 200 FPS when cut at test time. We show that
the proposed regression approach is invariant to the size of feature maps,
allowing our method to perform multi-resolution intermediate supervisions and
reaching results comparable to the state-of-the-art with very low resolution
feature maps. We demonstrate the accuracy and the effectiveness of our method
by providing extensive experiments on two of the most important publicly
available datasets for 3D pose estimation, Human3.6M and MPI-INF-3DHP.
Additionally, we provide relevant insights about our decisions on the network
architecture and show its flexibility to meet the best precision-speed
compromise.
Related papers
- GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction.
These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization.
We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z) - Back to MLP: A Simple Baseline for Human Motion Prediction [59.18776744541904]
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences.
We show that the performance of these approaches can be surpassed by a light-weight and purely architectural architecture with only 0.14M parameters.
An exhaustive evaluation on Human3.6M, AMASS and 3DPW datasets shows that our method, which we dub siMLPe, consistently outperforms all other approaches.
arXiv Detail & Related papers (2022-07-04T16:35:58Z) - Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans.
Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art.
LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z) - Multi-initialization Optimization Network for Accurate 3D Human Pose and
Shape Estimation [75.44912541912252]
We propose a three-stage framework named Multi-Initialization Optimization Network (MION)
In the first stage, we strategically select different coarse 3D reconstruction candidates which are compatible with the 2D keypoints of input sample.
In the second stage, we design a mesh refinement transformer (MRT) to respectively refine each coarse reconstruction result via a self-attention mechanism.
Finally, a Consistency Estimation Network (CEN) is proposed to find the best result from mutiple candidates by evaluating if the visual evidence in RGB image matches a given 3D reconstruction.
arXiv Detail & Related papers (2021-12-24T02:43:58Z) - Sparse Depth Completion with Semantic Mesh Deformation Optimization [4.03103540543081]
We propose a neural network with post-optimization, which takes an RGB image and sparse depth samples as input and predicts the complete depth map.
Our evaluation results outperform the existing work consistently on both indoor and outdoor datasets.
arXiv Detail & Related papers (2021-12-10T13:01:06Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage
Optimization [33.02708860641971]
Estimating 3D human poses from a monocular video is still a challenging task.
Many existing methods drop when the target person is cluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data.
We introduce atemporal-temporal network for robust 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T15:24:28Z) - Human Body Model Fitting by Learned Gradient Descent [48.79414884222403]
We propose a novel algorithm for the fitting of 3D human shape to images.
We show that this algorithm is fast (avg. 120ms convergence), robust to dataset, and achieves state-of-the-art results on public evaluation datasets.
arXiv Detail & Related papers (2020-08-19T14:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.