Related papers: Motion-DVAE: Unsupervised learning for fast human motion denoising

Motion-DVAE: Unsupervised learning for fast human motion denoising

URL: http://arxiv.org/abs/2306.05846v2
Date: Thu, 30 Nov 2023 07:42:04 GMT
Title: Motion-DVAE: Unsupervised learning for fast human motion denoising
Authors: Gu\'enol\'e Fiche, Simon Leglaive, Xavier Alameda-Pineda, Renaud S\'eguier
Abstract summary: We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion. Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches.
Score: 18.432026846779372
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pose and motion priors are crucial for recovering realistic and accurate human motion from noisy observations. Substantial progress has been made on pose and shape estimation from images, and recent works showed impressive results using priors to refine frame-wise predictions. However, a lot of motion priors only model transitions between consecutive poses and are used in time-consuming optimization procedures, which is problematic for many applications requiring real-time motion capture. We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion. As part of the dynamical variational autoencoder (DVAE) models family, Motion-DVAE combines the generative capability of VAE models and the temporal modeling of recurrent architectures. Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches in a single framework for real-time 3D human pose estimation. Experiments show that the proposed approach reaches competitive performance with state-of-the-art methods while being much faster.

Related papers

GeoMotion: Rethinking Motion Segmentation via Latent 4D Geometry [61.24189040578178]
We propose a fully learning-based approach that directly infers moving objects from latent feature representations via attention mechanisms.<n>Our key insight is to bypass explicit correspondence estimation and instead let the model learn to implicitly disentangle object and camera motion.<n>Our approach achieves state-of-the-art motion segmentation performance with high efficiency.
arXiv Detail & Related papers (2026-02-25T11:36:33Z)
Masked Modeling for Human Motion Recovery Under Occlusions [21.05382087890133]
MoRo is an end-to-end generative framework that formulates motion reconstruction as a video-conditioned task.<n>MoRo achieves real-time inference at 70 FPS on a single H200 GPU.
arXiv Detail & Related papers (2026-01-22T16:22:20Z)
Diffusion-based 3D Hand Motion Recovery with Intuitive Physics [29.784542628690794]
We present a novel 3D hand motion recovery framework that enhances image-based reconstructions.<n>Our model captures the distribution of refined motion estimates conditioned on initial ones, generating improved sequences.<n>We identify valuable intuitive physics knowledge during hand-object interactions, including key motion states and their associated motion constraints.
arXiv Detail & Related papers (2025-08-03T16:44:24Z)
GENMO: A GENeralist Model for Human MOtion [64.16188966024542]
We present GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework.<n>Our key insight is to reformulate motion estimation as constrained motion generation, where the output motion must precisely satisfy observed conditioning signals.<n>Our novel architecture handles variable-length motions and mixed multimodal conditions (text, audio, video) at different time intervals, offering flexible control.
arXiv Detail & Related papers (2025-05-02T17:59:55Z)
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning [95.07708090428814]
We present REWIND, a one-step diffusion model for real-time, high-fidelity human motion estimation from egocentric image inputs. We introduce cascaded body-hand denoising diffusion, which effectively models the correlation between egocentric body and hand motions. We also propose a novel identity conditioning method based on a small set of pose exemplars of the target identity, which further enhances motion estimation quality.
arXiv Detail & Related papers (2025-04-07T11:44:11Z)
ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening [10.813269931915364]
We learn rich motion from prior sequence of complete parametric models of human body shape. Our prior can easily estimate poses in missing frames or noisy measurements. ReMP consistently outperforms the baseline method on diverse and practical 3D motion data.
arXiv Detail & Related papers (2024-11-13T02:42:07Z)
MoManifold: Learning to Measure 3D Human Motion via Decoupled Joint Acceleration Manifolds [20.83684434910106]
We present MoManifold, a novel human motion prior, which models plausible human motion in continuous high-dimensional motion space. Specifically, we propose novel decoupled joint acceleration to model human dynamics from existing limited motion data. Extensive experiments demonstrate that MoManifold outperforms existing SOTAs as a prior in several downstream tasks.
arXiv Detail & Related papers (2024-09-01T15:00:16Z)
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation [98.05046790227561]
COIN is a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions. COIN outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
arXiv Detail & Related papers (2024-08-29T10:36:29Z)
Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications. Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z)
Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture. Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation. Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z)
Transformer Inertial Poser: Attention-based Real-time Human Motion Reconstruction from Sparse IMUs [79.72586714047199]
We propose an attention-based deep learning method to reconstruct full-body motion from six IMU sensors in real-time. Our method achieves new state-of-the-art results both quantitatively and qualitatively, while being simple to implement and smaller in size.
arXiv Detail & Related papers (2022-03-29T16:24:52Z)
Investigating Pose Representations and Motion Contexts Modeling for 3D Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task. We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction. Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z)
Conditional Temporal Variational AutoEncoder for Action Video Prediction [66.63038712306606]
ACT-VAE predicts pose sequences for an action clips from a single input image. When connected with a plug-and-play Pose-to-Image (P2I) network, ACT-VAE can synthesize image sequences.
arXiv Detail & Related papers (2021-08-12T10:59:23Z)
HuMoR: 3D Human Motion Model for Robust Pose Estimation [100.55369985297797]
HuMoR is a 3D Human Motion Model for Robust Estimation of temporal pose and shape. We introduce a conditional variational autoencoder, which learns a distribution of the change in pose at each step of a motion sequence. We demonstrate that our model generalizes to diverse motions and body shapes after training on a large motion capture dataset.
arXiv Detail & Related papers (2021-05-10T21:04:55Z)
Learning a Generative Motion Model from Image Sequences based on a Latent Motion Matrix [8.774604259603302]
We learn a probabilistic motion model from simulating temporal-temporal registration in a sequence of images. We show improved registration accuracy-temporally smoother consistencys compared to three state-of-the-art registration algorithms. We also demonstrate the model's applicability for motion analysis, simulation and super-resolution by an improved motion reconstruction from sequences with missing frames.
arXiv Detail & Related papers (2020-11-03T14:44:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.