Related papers: DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos

DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos

URL: http://arxiv.org/abs/2303.13397v4
Date: Tue, 23 Jul 2024 01:44:15 GMT
Title: DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos
Authors: Ce Zheng, Xianpeng Liu, Qucheng Peng, Tianfu Wu, Pu Wang, Chen Chen,
Abstract summary: Human mesh recovery (HMR) provides rich human body information for various real-world applications. Video-based approaches leverage temporal information to mitigate this issue. We present DiffMesh, an innovative motion-aware Diffusion-like framework for video-based HMR.
Score: 20.895221536570627
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human mesh recovery (HMR) provides rich human body information for various real-world applications. While image-based HMR methods have achieved impressive results, they often struggle to recover humans in dynamic scenarios, leading to temporal inconsistencies and non-smooth 3D motion predictions due to the absence of human motion. In contrast, video-based approaches leverage temporal information to mitigate this issue. In this paper, we present DiffMesh, an innovative motion-aware Diffusion-like framework for video-based HMR. DiffMesh establishes a bridge between diffusion models and human motion, efficiently generating accurate and smooth output mesh sequences by incorporating human motion within the forward process and reverse process in the diffusion model. Extensive experiments are conducted on the widely used datasets (Human3.6M \cite{h36m_pami} and 3DPW \cite{pw3d2018}), which demonstrate the effectiveness and efficiency of our DiffMesh. Visual comparisons in real-world scenarios further highlight DiffMesh's suitability for practical applications.

Related papers

Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling [20.88616874056278]
We present Ani3DHuman, a framework that marries kinematics-based animation with video diffusion priors.<n>We first introduce a layered motion representation that disentangles rigid motion from residual non-rigid motion.<n>We propose a novel self-guided photorealistic sampling method, which effectively addresses the out-of-distribution problem.
arXiv Detail & Related papers (2026-02-22T08:07:28Z)
Masked Modeling for Human Motion Recovery Under Occlusions [21.05382087890133]
MoRo is an end-to-end generative framework that formulates motion reconstruction as a video-conditioned task.<n>MoRo achieves real-time inference at 70 FPS on a single H200 GPU.
arXiv Detail & Related papers (2026-01-22T16:22:20Z)
Object-Aware 4D Human Motion Generation [20.338809521456298]
We propose an object-aware 4D human motion generation framework grounded in 3D Gaussian representations and motion diffusion priors.<n>Our framework produces natural and physically plausible human motions that respect 3D spatial context.
arXiv Detail & Related papers (2025-10-31T20:40:17Z)
Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion [91.54433928140816]
We propose Facial Action Diffusion (FAD), which introduces the diffusion methods from the field of image generation to achieve efficient facial action generation. We further build the Efficient Listener Network (ELNet) specially designed to accommodate both the visual and audio information of the speaker as input. Considering of FAD and ELNet, the proposed method learns effective listener facial motion representations and leads to improvements of performance over the state-of-the-art methods.
arXiv Detail & Related papers (2025-04-29T12:08:02Z)
Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones. Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency. We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z)
Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera [3.6948631725065355]
We present DiffOpt, a novel 3D global HMR method using Diffusion Optimization. Our key insight is that recent advances in human motion generation, such as the motion diffusion model (MDM), contain a strong prior of coherent human motion. We validate DiffOpt with video sequences from the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild.
arXiv Detail & Related papers (2024-11-15T21:09:40Z)
MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds [20.83684434910106]
We present MoManifold, a novel human motion prior, which models plausible human motion in continuous high-dimensional motion space. Specifically, we propose novel decoupled joint acceleration to model human dynamics from existing limited motion data. Extensive experiments demonstrate that MoManifold outperforms existing SOTAs as a prior in several downstream tasks.
arXiv Detail & Related papers (2024-09-01T15:00:16Z)
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation [98.05046790227561]
COIN is a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions. COIN outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
arXiv Detail & Related papers (2024-08-29T10:36:29Z)
RoHM: Robust Human Motion Reconstruction via Diffusion [58.63706638272891]
RoHM is an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos. It conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time.
arXiv Detail & Related papers (2024-01-16T18:57:50Z)
HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions. Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios. We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z)
Realistic Human Motion Generation with Cross-Diffusion Models [30.854425772128568]
Cross Human Motion Diffusion Model (CrossDiff) Method integrates 3D and 2D information using a shared transformer network within the training of the diffusion model. CrossDiff effectively combines the strengths of both representations to generate more realistic motion sequences.
arXiv Detail & Related papers (2023-12-18T07:44:40Z)
Distribution-Aligned Diffusion for Human Mesh Recovery [16.64567393672489]
We propose a diffusion-based approach for human mesh recovery. We propose a Human Mesh Diffusion (HMDiff) framework which frames mesh recovery as a reverse diffusion process. Our method achieves state-of-the-art performance on three widely used datasets.
arXiv Detail & Related papers (2023-08-25T13:29:31Z)
Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models [58.357180353368896]
We propose a conditional paradigm that benefits from the denoising diffusion probabilistic model (DDPM) to tackle the problem of realistic and diverse action-conditioned 3D skeleton-based motion generation. We are a pioneering attempt that uses DDPM to synthesize a variable number of motion sequences conditioned on a categorical action.
arXiv Detail & Related papers (2023-01-10T13:15:42Z)
Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs. Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z)
Human Motion Diffusion Model [35.05219668478535]
Motion Diffusion Model (MDM) is a transformer-based generative model for the human motion domain. We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion.
arXiv Detail & Related papers (2022-09-29T16:27:53Z)
Learning Local Recurrent Models for Human Mesh Recovery [50.85467243778406]
We present a new method for video mesh recovery that divides the human mesh into several local parts following the standard skeletal model. We then model the dynamics of each local part with separate recurrent models, with each model conditioned appropriately based on the known kinematic structure of the human body. This results in a structure-informed local recurrent learning architecture that can be trained in an end-to-end fashion with available annotations.
arXiv Detail & Related papers (2021-07-27T14:30:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.