Related papers: Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

URL: http://arxiv.org/abs/2401.00847v2
Date: Mon, 6 May 2024 08:14:01 GMT
Title: Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Authors: Jiye Lee, Hanbyul Joo,
Abstract summary: We present a lightweight and affordable motion capture method based on two smartwatches and a head-mounted camera. Our method can make wearable motion capture accessible to everyone everywhere, enabling 3D full-body motion capture in diverse environments.
Score: 10.055317239956423
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a lightweight and affordable motion capture method based on two smartwatches and a head-mounted camera. In contrast to the existing approaches that use six or more expert-level IMU devices, our approach is much more cost-effective and convenient. Our method can make wearable motion capture accessible to everyone everywhere, enabling 3D full-body motion capture in diverse environments. As a key idea to overcome the extreme sparsity and ambiguities of sensor inputs with different modalities, we integrate 6D head poses obtained from the head-mounted cameras for motion estimation. To enable capture in expansive indoor and outdoor scenes, we propose an algorithm to track and update floor level changes to define head poses, coupled with a multi-stage Transformer-based regression module. We also introduce novel strategies leveraging visual cues of egocentric images to further enhance the motion capture quality while reducing ambiguities. We demonstrate the performance of our method on various challenging scenarios, including complex outdoor environments and everyday motions including object interactions and social interactions among multiple individuals.

Related papers

Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input [62.51283548975632]
This work focuses on tracking and understanding human motion using consumer wearable devices, such as VR/AR headsets, smart glasses, cellphones, and smartwatches. We present Ego4o (o for omni), a new framework for simultaneous human motion capture and understanding from multi-modal egocentric inputs.
arXiv Detail & Related papers (2025-04-11T11:18:57Z)
FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications. Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings. We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z)
HumanMM: Global Human Motion Recovery from Multi-shot Videos [24.273414172013933]
We present a novel framework designed to reconstruct long-sequence 3D human motion in the world coordinates from in-the-wild videos with multiple shot transitions. Such long-sequence in-the-wild motions are highly valuable to applications such as motion generation and motion understanding.
arXiv Detail & Related papers (2025-03-10T17:57:03Z)
UniAvatar: Taming Lifelike Audio-Driven Talking Head Generation with Comprehensive Motion and Lighting Control [17.039951897703645]
We introduce UniAvatar, a method that provides extensive control over a wide range of motion and illumination conditions. Specifically, we use the FLAME model to render all motion information onto a single image, maintaining the integrity of 3D motion details. We design independent modules to manage both 3D motion and illumination, permitting separate and combined control.
arXiv Detail & Related papers (2024-12-26T07:39:08Z)
A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions [56.709280823844374]
We introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions. We also propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation. Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions.
arXiv Detail & Related papers (2024-12-23T08:26:00Z)
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera [49.82535393220003]
Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild. We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery. This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
arXiv Detail & Related papers (2024-12-17T12:43:10Z)
Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes [83.55301458112672]
Sitcom-Crafter is a system for human motion generation in 3D space. Central to the function generation modules is our novel 3D scene-aware human-human interaction module. Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types.
arXiv Detail & Related papers (2024-10-14T17:56:19Z)
Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images. We apply a diversity-based sampling algorithm to optimize the camera selection. We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z)
Motion Capture from Inertial and Vision Sensors [60.5190090684795]
MINIONS is a large-scale Motion capture dataset collected from INertial and visION Sensors. We conduct experiments on multi-modal motion capture using a monocular camera and very few IMUs.
arXiv Detail & Related papers (2024-07-23T09:41:10Z)
EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z)
PACE: Human and Camera Motion Estimation from in-the-wild Videos [113.76041632912577]
We present a method to estimate human motion in a global scene from moving cameras. This is a highly challenging task due to the coupling of human and camera motions in the video. We propose a joint optimization framework that disentangles human and camera motions using both foreground human motion priors and background scene features.
arXiv Detail & Related papers (2023-10-20T19:04:14Z)
Proactive Multi-Camera Collaboration For 3D Human Pose Estimation [16.628446718419344]
This paper presents a multi-agent reinforcement learning scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation. Active camera approaches proactively control camera poses to find optimal viewpoints for 3D reconstruction. We jointly train our model with multiple world dynamics learning tasks to better capture environment dynamics.
arXiv Detail & Related papers (2023-03-07T10:01:00Z)
HybridCap: Inertia-aid Monocular Capture of Challenging Human Motions [41.56735523771541]
We present a light-weight, hybrid mocap technique called HybridCap. It augments the camera with only 4 Inertial Measurement Units (IMUs) in a learning-and-optimization framework. It can robustly handle challenging movements ranging from fitness actions to Latin dance.
arXiv Detail & Related papers (2022-03-17T12:30:17Z)
Event-based Motion Segmentation by Cascaded Two-Level Multi-Model Fitting [44.97191206895915]
We present a cascaded two-level multi-model fitting method for identifying independently moving objects with a monocular event camera. Experiments demonstrate the effectiveness and versatility of our method in real-world scenes with different motion patterns and an unknown number of moving objects.
arXiv Detail & Related papers (2021-11-05T12:59:41Z)
Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task. We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z)
Lightweight Multi-person Total Motion Capture Using Sparse Multi-view Cameras [35.67288909201899]
We propose a lightweight total motion capture system for multi-person interactive scenarios using only sparse multi-view cameras. Our method is capable of efficient localization and accurate association of the hands and faces even on severe occluded occasions. Overall, we propose the first light-weight total capture system and achieves fast, robust and accurate multi-person total motion capture performance.
arXiv Detail & Related papers (2021-08-23T19:23:35Z)
SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos [40.19723456533343]
We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input. Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding. Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
arXiv Detail & Related papers (2021-04-23T07:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.