Beyond Static Features for Temporally Consistent 3D Human Pose and Shape
from a Video
- URL: http://arxiv.org/abs/2011.08627v4
- Date: Tue, 27 Apr 2021 06:54:19 GMT
- Title: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape
from a Video
- Authors: Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee
- Abstract summary: We present a temporally consistent mesh recovery system (TCMR)
It effectively focuses on the past and future frames' temporal information without being dominated by the current static feature.
It significantly outperforms previous video-based methods in temporal consistency with better per-frame 3D pose and shape accuracy.
- Score: 68.4542008229477
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the recent success of single image-based 3D human pose and shape
estimation methods, recovering temporally consistent and smooth 3D human motion
from a video is still challenging. Several video-based methods have been
proposed; however, they fail to resolve the single image-based methods'
temporal inconsistency issue due to a strong dependency on a static feature of
the current frame. In this regard, we present a temporally consistent mesh
recovery system (TCMR). It effectively focuses on the past and future frames'
temporal information without being dominated by the current static feature. Our
TCMR significantly outperforms previous video-based methods in temporal
consistency with better per-frame 3D pose and shape accuracy. We also release
the codes. For the demo video, see https://youtu.be/WB3nTnSQDII. For the codes,
see https://github.com/hongsukchoi/TCMR_RELEASE.
Related papers
- ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos [18.685856290041283]
ARTS surpasses existing state-of-the-art video-based methods in both per-frame accuracy and temporal consistency on popular benchmarks.
A skeleton estimation and disentanglement module is proposed to estimate the 3D skeletons from a video.
The regressor consists of three modules: Temporal Inverse Kinematics (TIK), Bone-guided Shape Fitting (BSF), and Motion-Centric Refinement (MCR)
arXiv Detail & Related papers (2024-10-21T02:06:43Z) - STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment
Fusion [35.42718669331158]
Existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity.
As a video-based model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module.
In addition, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame.
arXiv Detail & Related papers (2024-01-03T13:07:14Z) - Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video.
The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z) - TAPE: Temporal Attention-based Probabilistic human pose and shape
Estimation [7.22614468437919]
Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose.
We present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video.
We show that TAPE outperforms state-of-the-art methods in standard benchmarks.
arXiv Detail & Related papers (2023-04-29T06:08:43Z) - HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars.
Our method learns a canonical space via an implicit function parameterized by a neural network.
At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Human Mesh Recovery from Multiple Shots [85.18244937708356]
We propose a framework for improved 3D reconstruction and mining of long sequences with pseudo ground truth 3D human mesh.
We show that the resulting data is beneficial in the training of various human mesh recovery models.
The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media.
arXiv Detail & Related papers (2020-12-17T18:58:02Z) - Appearance-Preserving 3D Convolution for Video-based Person
Re-identification [61.677153482995564]
We propose AppearancePreserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel.
It is easy to combine AP3D with existing 3D ConvNets by simply replacing the original 3D convolution kernels with AP3Ds.
arXiv Detail & Related papers (2020-07-16T16:21:34Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.