Related papers: SOMA: Solving Optical Marker-Based MoCap Automatically

SOMA: Solving Optical Marker-Based MoCap Automatically

URL: http://arxiv.org/abs/2110.04431v1
Date: Sat, 9 Oct 2021 02:27:27 GMT
Title: SOMA: Solving Optical Marker-Based MoCap Automatically
Authors: Nima Ghorbani and Michael J. Black
Abstract summary: We train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points and labels them at scale. Soma exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body. We automatically label over 8 hours of archival mocap data across 4 different datasets.
Score: 56.59083192247637
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Marker-based optical motion capture (mocap) is the "gold standard" method for acquiring accurate 3D human motion in computer vision, medicine, and graphics. The raw output of these systems are noisy and incomplete 3D points or short tracklets of points. To be useful, one must associate these points with corresponding markers on the captured subject; i.e. "labelling". Given these labels, one can then "solve" for the 3D skeleton or body surface mesh. Commercial auto-labeling tools require a specific calibration procedure at capture time, which is not possible for archival data. Here we train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points, labels them at scale without any calibration data, independent of the capture technology, and requiring only minimal human intervention. Our key insight is that, while labeling point clouds is highly ambiguous, the 3D body provides strong constraints on the solution that can be exploited by a learning-based method. To enable learning, we generate massive training sets of simulated noisy and ground truth mocap markers animated by 3D bodies from AMASS. SOMA exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body and an optimal transport layer to constrain the assignment (labeling) problem while rejecting outliers. We extensively evaluate SOMA both quantitatively and qualitatively. SOMA is more accurate and robust than existing state of the art research methods and can be applied where commercial systems cannot. We automatically label over 8 hours of archival mocap data across 4 different datasets captured using various technologies and output SMPL-X body models. The model and data is released for research purposes at https://soma.is.tue.mpg.de/.

Related papers

PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection [35.524943073010675]
Monocular 3D object detection (M3OD) has long faced challenges due to data scarcity caused by high annotation costs and inherent 2D-to-3D ambiguity.<n>We propose a novel pseudo-labeling framework that uses only video data and is more robust to occlusion, without requiring a multi-view setup, additional sensors, camera poses, or domain-specific training.
arXiv Detail & Related papers (2025-07-03T07:46:39Z)
MAMMA: Markerless & Automatic Multi-Person Motion Action Capture [37.06717786024836]
MAMMA is a markerless motion-capture pipeline that recovers SMPL-X parameters from multi-view video of two-person interaction sequences.<n>We introduce a method that predicts dense 2D surface landmarks conditioned on segmentation masks.<n>We demonstrate that our approach can handle complex person--person interaction and offers greater accuracy than existing methods.
arXiv Detail & Related papers (2025-06-16T02:04:51Z)
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D [68.23391872643268]
LOCATE 3D is a model for localizing objects in 3D scenes from referring expressions like "the small coffee table between the sofa and the lamp" It operates directly on sensor observation streams (posed RGB-D frames), enabling real-world deployment on robots and AR devices.
arXiv Detail & Related papers (2025-04-19T02:51:24Z)
DINeMo: Learning Neural Mesh Models with no 3D Annotations [7.21992608540601]
Category-level 3D/6D pose estimation is a crucial step towards comprehensive 3D scene understanding. Recent works explored neural mesh models that approach a range of 2D and 3D tasks from an analysis-by-synthesis perspective. We present DINeMo, a novel neural mesh model that is trained with no 3D annotations by leveraging pseudo-correspondence.
arXiv Detail & Related papers (2025-03-26T04:23:53Z)
Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection [59.33188668341604]
3D object detection serves as the fundamental task of autonomous driving perception. It is costly to obtain high-quality annotations for point cloud data. We propose a hardness-aware scene synthesis (HASS) method to generate adaptive synthetic scenes.
arXiv Detail & Related papers (2024-05-27T17:59:23Z)
MixSup: Mixed-grained Supervision for Label-efficient LiDAR-based 3D Object Detection [59.1417156002086]
MixSup is a more practical paradigm simultaneously utilizing massive cheap coarse labels and a limited number of accurate labels for Mixed-grained Supervision. MixSup achieves up to 97.31% of fully supervised performance, using cheap cluster annotations and only 10% box annotations.
arXiv Detail & Related papers (2024-01-29T17:05:19Z)
SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network [1.4732811715354455]
We introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network) A lightweight auto-encoder, based on spiral convolutions, is employed to extract spatial geometrical features from each 3D mesh. The proposed method is evaluated on three prominent 3D human action datasets: Babel, MoVi, and BMLrub.
arXiv Detail & Related papers (2023-06-30T11:49:00Z)
3D Human Mesh Estimation from Virtual Markers [34.703241940871635]
We present an intermediate representation, named virtual markers, which learns 64 landmark keypoints on the body surface. Our approach outperforms the state-of-the-art methods on three datasets.
arXiv Detail & Related papers (2023-03-21T10:30:43Z)
Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z)
An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection [72.30883544352918]
We investigate whether pseudo-labels can provide effective supervision for the baseline models under varying settings. We achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP. We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting.
arXiv Detail & Related papers (2022-08-15T12:17:46Z)
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training [56.81809311892475]
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for language and 2D image transformers. We propose Point-M2AE, a strong Multi-scale MAE pre-training framework for hierarchical self-supervised learning of 3D point clouds.
arXiv Detail & Related papers (2022-05-28T11:22:53Z)
Semi-supervised 3D Object Detection via Adaptive Pseudo-Labeling [18.209409027211404]
3D object detection is an important task in computer vision. Most existing methods require a large number of high-quality 3D annotations, which are expensive to collect. We propose a novel semi-supervised framework based on pseudo-labeling for outdoor 3D object detection tasks.
arXiv Detail & Related papers (2021-08-15T02:58:43Z)
labelCloud: A Lightweight Domain-Independent Labeling Tool for 3D Object Detection in Point Clouds [0.0]
We propose a novel tool for 3D object detection in point clouds to address shortcomings of existing tools. We show that the tool can be used to label 3D bounding boxes around target objects the ML model should later automatically identify, e.g., pedestrians for autonomous driving or cancer cells within radiography.
arXiv Detail & Related papers (2021-03-05T09:32:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.