Related papers: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video

Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video

URL: http://arxiv.org/abs/2011.08627v4
Date: Tue, 27 Apr 2021 06:54:19 GMT
Title: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video
Authors: Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee
Abstract summary: We present a temporally consistent mesh recovery system (TCMR) It effectively focuses on the past and future frames' temporal information without being dominated by the current static feature. It significantly outperforms previous video-based methods in temporal consistency with better per-frame 3D pose and shape accuracy.
Score: 68.4542008229477
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Despite the recent success of single image-based 3D human pose and shape estimation methods, recovering temporally consistent and smooth 3D human motion from a video is still challenging. Several video-based methods have been proposed; however, they fail to resolve the single image-based methods' temporal inconsistency issue due to a strong dependency on a static feature of the current frame. In this regard, we present a temporally consistent mesh recovery system (TCMR). It effectively focuses on the past and future frames' temporal information without being dominated by the current static feature. Our TCMR significantly outperforms previous video-based methods in temporal consistency with better per-frame 3D pose and shape accuracy. We also release the codes. For the demo video, see https://youtu.be/WB3nTnSQDII. For the codes, see https://github.com/hongsukchoi/TCMR_RELEASE.

Related papers

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering [54.489285024494855]
Video stabilization is pivotal for video processing, as it removes unwanted shakiness while preserving the original user motion intent.<n>Existing approaches, depending on the domain they operate, suffer from several issues that degrade the user experience.<n>We introduce textbfGaVS, a novel 3D-grounded approach that reformulates video stabilization as a temporally-consistent local reconstruction and rendering' paradigm.
arXiv Detail & Related papers (2025-06-30T15:24:27Z)
CoMotion: Concurrent Multi-person 3D Motion [88.27833466761234]
We introduce an approach for detecting and tracking detailed 3D poses of multiple people from a single monocular camera stream. Our model performs both strong per-frame detection and a learned pose update to track people from frame to frame. We train on numerous image and video datasets leveraging pseudo-labeled annotations to produce a model that matches state-of-the-art systems in 3D pose estimation accuracy.
arXiv Detail & Related papers (2025-04-16T15:40:15Z)
Predicting 4D Hand Trajectory from Monocular Videos [63.842530566039606]
HaPTIC is an approach that infers coherent 4D hand trajectories from monocular videos. It significantly outperforms existing methods in global trajectory accuracy. It is comparable to the state-of-the-art in single-image pose estimation.
arXiv Detail & Related papers (2025-01-14T18:59:05Z)
Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion [22.185551913099598]
Single-image 3D portrait reconstruction has enabled telepresence systems to stream 3D portrait videos from a single camera in real-time. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. We propose a new fusion-based method that takes the best of both worlds by fusing a canonical 3D prior from a reference view with dynamic appearance from per-frame input views.
arXiv Detail & Related papers (2024-12-11T18:57:24Z)
ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos [18.685856290041283]
ARTS surpasses existing state-of-the-art video-based methods in both per-frame accuracy and temporal consistency on popular benchmarks. A skeleton estimation and disentanglement module is proposed to estimate the 3D skeletons from a video. The regressor consists of three modules: Temporal Inverse Kinematics (TIK), Bone-guided Shape Fitting (BSF), and Motion-Centric Refinement (MCR)
arXiv Detail & Related papers (2024-10-21T02:06:43Z)
STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion [35.42718669331158]
Existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity. As a video-based model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module. In addition, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame.
arXiv Detail & Related papers (2024-01-03T13:07:14Z)
Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video. The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z)
TAPE: Temporal Attention-based Probabilistic human pose and shape Estimation [7.22614468437919]
Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose. We present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video. We show that TAPE outperforms state-of-the-art methods in standard benchmarks.
arXiv Detail & Related papers (2023-04-29T06:08:43Z)
HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z)
Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation. We exploit temporal information in videos and propose a self-attention module. We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z)
Human Mesh Recovery from Multiple Shots [85.18244937708356]
We propose a framework for improved 3D reconstruction and mining of long sequences with pseudo ground truth 3D human mesh. We show that the resulting data is beneficial in the training of various human mesh recovery models. The tools we develop open the door to processing and analyzing in 3D content from a large library of edited media.
arXiv Detail & Related papers (2020-12-17T18:58:02Z)
Appearance-Preserving 3D Convolution for Video-based Person Re-identification [61.677153482995564]
We propose AppearancePreserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel. It is easy to combine AP3D with existing 3D ConvNets by simply replacing the original 3D convolution kernels with AP3Ds.
arXiv Detail & Related papers (2020-07-16T16:21:34Z)
A Real-time Action Representation with Temporal Encoding and Deep Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation. T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed. Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.