Related papers: OfCaM: Global Human Mesh Recovery via Optimization-free Camera Motion Scale Calibration

OfCaM: Global Human Mesh Recovery via Optimization-free Camera Motion Scale Calibration

URL: http://arxiv.org/abs/2407.00574v1
Date: Sun, 30 Jun 2024 03:31:21 GMT
Title: OfCaM: Global Human Mesh Recovery via Optimization-free Camera Motion Scale Calibration
Authors: Fengyuan Yang, Kerui Gu, Ha Linh Nguyen, Angela Yao,
Abstract summary: This paper presents a novel framework that utilizes prior knowledge from human mesh recovery (HMR) models to directly calibrate the unknown scale factor. Our method sets a new standard for global human mesh estimation tasks, reducing global human motion error by 60% over the prior SOTA.
Score: 32.69343215997592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate camera motion estimation is critical to estimate human motion in the global space. A standard and widely used method for estimating camera motion is Simultaneous Localization and Mapping (SLAM). However, SLAM only provides a trajectory up to an unknown scale factor. Different from previous attempts that optimize the scale factor, this paper presents Optimization-free Camera Motion Scale Calibration (OfCaM), a novel framework that utilizes prior knowledge from human mesh recovery (HMR) models to directly calibrate the unknown scale factor. Specifically, OfCaM leverages the absolute depth of human-background contact joints from HMR predictions as a calibration reference, enabling the precise recovery of SLAM camera trajectory scale in global space. With this correctly scaled camera motion and HMR's local motion predictions, we achieve more accurate global human motion estimation. To compensate for scenes where we detect SLAM failure, we adopt a local-to-global motion mapping to fuse with previously derived motion to enhance robustness. Simple yet powerful, our method sets a new standard for global human mesh estimation tasks, reducing global human motion error by 60% over the prior SOTA while also demanding orders of magnitude less inference time compared with optimization-based methods.

Related papers

UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass [83.7071371474926]
UniSH is a unified, feed-forward framework for joint metric-scale 3D scene and human reconstruction.<n>Our framework bridges strong, disparate priors from scene reconstruction and HMR.<n>Our model achieves state-of-the-art performance on human-centric scene reconstruction.
arXiv Detail & Related papers (2026-01-03T16:06:27Z)
Human3R: Everyone Everywhere All at Once [69.16576238974876]
We present Human3R, a feed-forward framework for online 4D human-scene reconstruction from monocular videos.<n>Human3R is a unified model that eliminates heavy dependencies and iterative refinement.<n>It delivers superior performance across tasks, including global human motion estimation, local human mesh recovery, video depth estimation, and camera pose estimation.
arXiv Detail & Related papers (2025-10-07T17:59:52Z)
Reconstructing People, Places, and Cameras [57.81696692335401]
"Humans and Structure from Motion" (HSfM) is a method for jointly reconstructing multiple human meshes, scene point clouds, and camera parameters in a metric world coordinate system. Our results show that incorporating human data into the SfM pipeline improves camera pose estimation.
arXiv Detail & Related papers (2024-12-23T18:58:34Z)
Estimating Body and Hand Motion in an Ego-sensed World [62.61989004520802]
We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters.
arXiv Detail & Related papers (2024-10-04T17:59:57Z)
World-Grounded Human Motion Recovery via Gravity-View Coordinates [60.618543026949226]
We propose estimating human poses in a novel Gravity-View coordinate system. The proposed GV system is naturally gravity-aligned and uniquely defined for each video frame. Our method recovers more realistic motion in both the camera space and world-grounded settings, outperforming state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-09-10T17:25:47Z)
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation [98.05046790227561]
COIN is a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions. COIN outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation.
arXiv Detail & Related papers (2024-08-29T10:36:29Z)
Aligning Human Motion Generation with Human Perceptions [51.831338643012444]
We propose a data-driven approach to bridge the gap by introducing a large-scale human perceptual evaluation dataset, MotionPercept, and a human motion critic model, MotionCritic. Our critic model offers a more accurate metric for assessing motion quality and could be readily integrated into the motion generation pipeline.
arXiv Detail & Related papers (2024-07-02T14:01:59Z)
WHAC: World-grounded Humans and Cameras [37.877565981937586]
We aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly. We introduce a novel framework, referred to as WHAC, to facilitate world-grounded expressive human pose and shape estimation. We present a new synthetic dataset, WHAC-A-Mole, which includes accurately annotated humans and cameras.
arXiv Detail & Related papers (2024-03-19T17:58:02Z)
SISMIK for brain MRI: Deep-learning-based motion estimation and model-based motion correction in k-space [0.0]
We propose a retrospective method for motion estimation and correction for 2D Spin-Echo scans of the brain. The method leverages the power of deep neural networks to estimate motion parameters in k-space. It uses a model-based approach to restore degraded images to avoid ''hallucinations''
arXiv Detail & Related papers (2023-12-20T17:38:56Z)
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion [43.95997922499137]
WHAM (World-grounded Humans with Accurate Motion) reconstructs 3D human motion in a global coordinate system from video. Uses camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory. outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
arXiv Detail & Related papers (2023-12-12T18:57:46Z)
W-HMR: Monocular Human Mesh Recovery in World Space with Weak-Supervised Calibration [57.37135310143126]
Previous methods for 3D motion recovery from monocular images often fall short due to reliance on camera coordinates. We introduce W-HMR, a weak-supervised calibration method that predicts "reasonable" focal lengths based on body distortion information. We also present the OrientCorrect module, which corrects body orientation for plausible reconstructions in world space.
arXiv Detail & Related papers (2023-11-29T09:02:07Z)
PACE: Human and Camera Motion Estimation from in-the-wild Videos [113.76041632912577]
We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. We propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features.
arXiv Detail & Related papers (2023-10-20T19:04:14Z)
A Simple Method to Boost Human Pose Estimation Accuracy by Correcting the Joint Regressor for the Human3.6m Dataset [21.096409769550387]
We show that the most widely used SMPL-to-joint linear layer (joint regressor) is inaccurate. To achieve a more accurate joint regressor, we propose a method to create pseudo-ground-truth SMPL poses. We show that our regressor leads to improved pose estimations results on the test set without any need for retraining.
arXiv Detail & Related papers (2022-04-29T20:42:48Z)
GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras [99.07219478953982]
We present an approach for 3D global human mesh recovery from monocular videos recorded with dynamic cameras. We first propose a deep generative motion infiller, which autoregressively infills the body motions of occluded humans based on visible motions. In contrast to prior work, our approach reconstructs human meshes in consistent global coordinates even with dynamic cameras.
arXiv Detail & Related papers (2021-12-02T18:59:54Z)
Estimating Egocentric 3D Human Pose in Global Space [70.7272154474722]
We present a new method for egocentric global 3D body pose estimation using a single-mounted fisheye camera. Our approach outperforms state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-04-27T20:01:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.