3D Human Mesh Estimation from Virtual Markers
- URL: http://arxiv.org/abs/2303.11726v4
- Date: Mon, 1 Jul 2024 05:20:37 GMT
- Title: 3D Human Mesh Estimation from Virtual Markers
- Authors: Xiaoxuan Ma, Jiajun Su, Chunyu Wang, Wentao Zhu, Yizhou Wang,
- Abstract summary: We present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface.
Our approach outperforms the state-of-the-art methods on three datasets.
- Score: 34.703241940871635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the success of volumetric 3D pose estimation, some recent human mesh estimators propose to estimate 3D skeletons as intermediate representations, from which, the dense 3D meshes are regressed by exploiting the mesh topology. However, body shape information is lost in extracting skeletons, leading to mediocre performance. The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions. However, they cannot be applied to wild images without markers. In this work, we present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface based on the large-scale mocap data in a generative style, mimicking the effects of physical markers. The virtual markers can be accurately detected from wild images and can reconstruct the intact meshes with realistic shapes by simple interpolation. Our approach outperforms the state-of-the-art methods on three datasets. In particular, it surpasses the existing methods by a notable margin on the SURREAL dataset, which has diverse body shapes. Code is available at https://github.com/ShirleyMaxx/VirtualMarker
Related papers
- FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images.
This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets.
We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z) - PokeFlex: A Real-World Dataset of Deformable Objects for Robotics [17.533143584534155]
PokeFlex is a dataset featuring real-world paired and annotated multimodal data that includes 3D textured meshes, point clouds, RGB images, and depth maps.
Such data can be leveraged for several downstream tasks such as online 3D mesh reconstruction.
We demonstrate a use case for the PokeFlex dataset in online 3D mesh reconstruction.
arXiv Detail & Related papers (2024-10-10T07:54:17Z) - Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos.
We model hands as articulated objects inducing non-rigid face deformations during an active interaction.
Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z) - Sampling is Matter: Point-guided 3D Human Mesh Reconstruction [0.0]
This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image.
Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction.
arXiv Detail & Related papers (2023-04-19T08:45:26Z) - MoDA: Modeling Deformable 3D Objects from Casual Videos [84.29654142118018]
We propose neural dual quaternion blend skinning (NeuDBS) to achieve 3D point deformation without skin-collapsing artifacts.
In the endeavor to register 2D pixels across different frames, we establish a correspondence between canonical feature embeddings that encodes 3D points within the canonical space.
Our approach can reconstruct 3D models for humans and animals with better qualitative and quantitative performance than state-of-the-art methods.
arXiv Detail & Related papers (2023-04-17T13:49:04Z) - Gait Recognition in the Wild with Dense 3D Representations and A
Benchmark [86.68648536257588]
Existing studies for gait recognition are dominated by 2D representations like the silhouette or skeleton of the human body in constrained scenes.
This paper aims to explore dense 3D representations for gait recognition in the wild.
We build the first large-scale 3D representation-based gait recognition dataset, named Gait3D.
arXiv Detail & Related papers (2022-04-06T03:54:06Z) - Tracking People with 3D Representations [78.97070307547283]
We present a novel approach for tracking multiple people in video.
Unlike past approaches which employ 2D representations, we employ 3D representations of people, located in three-dimensional space.
We find that 3D representations are more effective than 2D representations for tracking in these settings.
arXiv Detail & Related papers (2021-11-15T16:15:21Z) - SOMA: Solving Optical Marker-Based MoCap Automatically [56.59083192247637]
We train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points and labels them at scale.
Soma exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body.
We automatically label over 8 hours of archival mocap data across 4 different datasets.
arXiv Detail & Related papers (2021-10-09T02:27:27Z) - Topologically Consistent Multi-View Face Inference Using Volumetric
Sampling [25.001398662643986]
ToFu is a geometry inference framework that can produce topologically consistent meshes across identities and expressions.
A novel progressive mesh generation network embeds the topological structure of the face in a feature volume.
These high-quality assets are readily usable by production studios for avatar creation, animation and physically-based skin rendering.
arXiv Detail & Related papers (2021-10-06T17:55:08Z) - Deep Virtual Markers for Articulated 3D Shapes [14.986945006208849]
We propose a framework that maps 3D points of 3D articulated models, like humans, into virtual marker labels.
We adopt a sparse convolutional neural network and classify 3D points of an articulated model into virtual marker labels.
We show additional applications using the estimated virtual markers, such as non-rigid registration, texture transfer, and realtime dense marker prediction from depth maps.
arXiv Detail & Related papers (2021-08-20T04:55:23Z) - HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D
Human Pose and Shape Estimation [39.67289969828706]
We propose a novel hybrid inverse kinematics solution (HybrIK) to bridge the gap between body mesh estimation and 3D keypoint estimation.
HybrIK directly transforms accurate 3D joints to relative body-part rotations for 3D body mesh reconstruction.
We show that HybrIK preserves both the accuracy of 3D pose and the realistic body structure of the parametric human model.
arXiv Detail & Related papers (2020-11-30T10:32:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.