SOMA: Solving Optical Marker-Based MoCap Automatically
- URL: http://arxiv.org/abs/2110.04431v1
- Date: Sat, 9 Oct 2021 02:27:27 GMT
- Title: SOMA: Solving Optical Marker-Based MoCap Automatically
- Authors: Nima Ghorbani and Michael J. Black
- Abstract summary: We train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points and labels them at scale.
Soma exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body.
We automatically label over 8 hours of archival mocap data across 4 different datasets.
- Score: 56.59083192247637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Marker-based optical motion capture (mocap) is the "gold standard" method for
acquiring accurate 3D human motion in computer vision, medicine, and graphics.
The raw output of these systems are noisy and incomplete 3D points or short
tracklets of points. To be useful, one must associate these points with
corresponding markers on the captured subject; i.e. "labelling". Given these
labels, one can then "solve" for the 3D skeleton or body surface mesh.
Commercial auto-labeling tools require a specific calibration procedure at
capture time, which is not possible for archival data. Here we train a novel
neural network called SOMA, which takes raw mocap point clouds with varying
numbers of points, labels them at scale without any calibration data,
independent of the capture technology, and requiring only minimal human
intervention. Our key insight is that, while labeling point clouds is highly
ambiguous, the 3D body provides strong constraints on the solution that can be
exploited by a learning-based method. To enable learning, we generate massive
training sets of simulated noisy and ground truth mocap markers animated by 3D
bodies from AMASS. SOMA exploits an architecture with stacked self-attention
elements to learn the spatial structure of the 3D body and an optimal transport
layer to constrain the assignment (labeling) problem while rejecting outliers.
We extensively evaluate SOMA both quantitatively and qualitatively. SOMA is
more accurate and robust than existing state of the art research methods and
can be applied where commercial systems cannot. We automatically label over 8
hours of archival mocap data across 4 different datasets captured using various
technologies and output SMPL-X body models. The model and data is released for
research purposes at https://soma.is.tue.mpg.de/.
Related papers
- Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model [52.27297680947337]
Multimodal language models (MLLMs) are increasingly being implemented in real-world environments.
Despite their potential, current top models within our community still fall short in adequately understanding spatial and temporal dimensions.
We introduce Coarse Correspondence, a training-free, effective, and general-purpose visual prompting method to elicit 3D and temporal understanding.
arXiv Detail & Related papers (2024-08-01T17:57:12Z) - Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection [59.33188668341604]
3D object detection serves as the fundamental task of autonomous driving perception.
It is costly to obtain high-quality annotations for point cloud data.
We propose a hardness-aware scene synthesis (HASS) method to generate adaptive synthetic scenes.
arXiv Detail & Related papers (2024-05-27T17:59:23Z) - SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network [1.4732811715354455]
We introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network)
A lightweight auto-encoder, based on spiral convolutions, is employed to extract spatial geometrical features from each 3D mesh.
The proposed method is evaluated on three prominent 3D human action datasets: Babel, MoVi, and BMLrub.
arXiv Detail & Related papers (2023-06-30T11:49:00Z) - 3D Human Mesh Estimation from Virtual Markers [34.703241940871635]
We present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface.
Our approach outperforms the state-of-the-art methods on three datasets.
arXiv Detail & Related papers (2023-03-21T10:30:43Z) - Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks.
Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z) - An Empirical Study of Pseudo-Labeling for Image-based 3D Object
Detection [72.30883544352918]
We investigate whether pseudo-labels can provide effective supervision for the baseline models under varying settings.
We achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP.
We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.
arXiv Detail & Related papers (2022-08-15T12:17:46Z) - Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers.
We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z) - Semi-supervised 3D Object Detection via Adaptive Pseudo-Labeling [18.209409027211404]
3D object detection is an important task in computer vision.
Most existing methods require a large number of high-quality 3D annotations, which are expensive to collect.
We propose a novel semi-supervised framework based on pseudo-labeling for outdoor 3D object detection tasks.
arXiv Detail & Related papers (2021-08-15T02:58:43Z) - labelCloud: A Lightweight Domain-Independent Labeling Tool for 3D Object
Detection in Point Clouds [0.0]
We propose a novel tool for 3D object detection in point clouds to address shortcomings of existing tools.
We show that the tool can be used to label 3D bounding boxes around target objects the ML model should later automatically identify, e.g., pedestrians for autonomous driving or cancer cells within radiography.
arXiv Detail & Related papers (2021-03-05T09:32:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.