Beyond Static Features for Temporally Consistent 3D Human Pose and Shape
from a Video
- URL: http://arxiv.org/abs/2011.08627v4
- Date: Tue, 27 Apr 2021 06:54:19 GMT
- Title: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape
from a Video
- Authors: Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee
- Abstract summary: We present a temporally consistent mesh recovery system (TCMR)
It effectively focuses on the past and future frames' temporal information without being dominated by the current static feature.
It significantly outperforms previous video-based methods in temporal consistency with better per-frame 3D pose and shape accuracy.
- Score: 68.4542008229477
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the recent success of single image-based 3D human pose and shape
estimation methods, recovering temporally consistent and smooth 3D human motion
from a video is still challenging. Several video-based methods have been
proposed; however, they fail to resolve the single image-based methods'
temporal inconsistency issue due to a strong dependency on a static feature of
the current frame. In this regard, we present a temporally consistent mesh
recovery system (TCMR). It effectively focuses on the past and future frames'
temporal information without being dominated by the current static feature. Our
TCMR significantly outperforms previous video-based methods in temporal
consistency with better per-frame 3D pose and shape accuracy. We also release
the codes. For the demo video, see https://youtu.be/WB3nTnSQDII. For the codes,
see https://github.com/hongsukchoi/TCMR_RELEASE.
Related papers
- Predicting 4D Hand Trajectory from Monocular Videos [63.842530566039606]
HaPTIC is an approach that infers coherent 4D hand trajectories from monocular videos.
It significantly outperforms existing methods in global trajectory accuracy.
It is comparable to the state-of-the-art in single-image pose estimation.
arXiv Detail & Related papers (2025-01-14T18:59:05Z) - Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion [22.185551913099598]
Single-image 3D portrait reconstruction has enabled telepresence systems to stream 3D portrait videos from a single camera in real-time.
However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance.
We propose a new fusion-based method that takes the best of both worlds by fusing a canonical 3D prior from a reference view with dynamic appearance from per-frame input views.
arXiv Detail & Related papers (2024-12-11T18:57:24Z) - STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment
Fusion [35.42718669331158]
Existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity.
As a video-based model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module.
In addition, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame.
arXiv Detail & Related papers (2024-01-03T13:07:14Z) - Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video.
The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z) - TAPE: Temporal Attention-based Probabilistic human pose and shape
Estimation [7.22614468437919]
Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose.
We present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video.
We show that TAPE outperforms state-of-the-art methods in standard benchmarks.
arXiv Detail & Related papers (2023-04-29T06:08:43Z) - HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars.
Our method learns a canonical space via an implicit function parameterized by a neural network.
At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Human Mesh Recovery from Multiple Shots [85.18244937708356]
We propose a framework for improved 3D reconstruction and mining of long sequences with pseudo ground truth 3D human mesh.
We show that the resulting data is beneficial in the training of various human mesh recovery models.
The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media.
arXiv Detail & Related papers (2020-12-17T18:58:02Z) - Appearance-Preserving 3D Convolution for Video-based Person
Re-identification [61.677153482995564]
We propose AppearancePreserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel.
It is easy to combine AP3D with existing 3D ConvNets by simply replacing the original 3D convolution kernels with AP3Ds.
arXiv Detail & Related papers (2020-07-16T16:21:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.