Motion Capture from Inertial and Vision Sensors
- URL: http://arxiv.org/abs/2407.16341v1
- Date: Tue, 23 Jul 2024 09:41:10 GMT
- Title: Motion Capture from Inertial and Vision Sensors
- Authors: Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, Tao Mei,
- Abstract summary: MINIONS is a large-scale Motion capture dataset collected from INertial and visION Sensors.
We conduct experiments on multi-modal motion capture using a monocular camera and very few IMUs.
- Score: 60.5190090684795
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial measurement units (IMUs) for accurate multi-modal human motion capture in daily life, we contribute MINIONS in this paper, a large-scale Motion capture dataset collected from INertial and visION Sensors. MINIONS has several featured properties: 1) large scale of over five million frames and 400 minutes duration; 2) multi-modality data of IMUs signals and RGB videos labeled with joint positions, joint rotations, SMPL parameters, etc.; 3) a diverse set of 146 fine-grained single and interactive actions with textual descriptions. With the proposed MINIONS, we conduct experiments on multi-modal motion capture and explore the possibilities of consumer-affordable motion capture using a monocular camera and very few IMUs. The experiment results emphasize the unique advantages of inertial and vision sensors, showcasing the promise of consumer-affordable multi-modal motion capture and providing a valuable resource for further research and development.
Related papers
- MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations [85.85596165472663]
We build MotionBank, which comprises 13 video action datasets, 1.24M motion sequences, and 132.9M frames of natural and diverse human motions.
Our MotionBank is beneficial for general motion-related tasks of human motion generation, motion in-context generation, and motion understanding.
arXiv Detail & Related papers (2024-10-17T17:31:24Z) - Large Motion Model for Unified Multi-Modal Motion Generation [50.56268006354396]
Large Motion Model (LMM) is a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.
LMM tackles these challenges from three principled aspects.
arXiv Detail & Related papers (2024-04-01T17:55:11Z) - RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method [44.670169033884896]
We present RELI11D, a high-quality multimodal human motion dataset involving LiDAR, IMU system, RGB camera, and Event camera.
It records the motions of 10 actors performing 5 sports in 7 scenes, including 3.32 hours of synchronized LiDAR point clouds, IMU measurement data, RGB videos, and Event steams.
To address the challenge of integrating different modalities, we propose LEIR, a multimodal baseline that effectively utilizes LiDAR Point Cloud, Event stream, and RGB.
arXiv Detail & Related papers (2024-03-28T15:31:36Z) - Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera [10.055317239956423]
We present a lightweight and affordable motion capture method based on two smartwatches and a head-mounted camera.
Our method can make wearable motion capture accessible to everyone everywhere, enabling 3D full-body motion capture in diverse environments.
arXiv Detail & Related papers (2024-01-01T18:56:54Z) - I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions [42.87514729260336]
I'm-HOI is a monocular scheme to faithfully capture the 3D motions of both the human and object in a novel setting.
It combines general motion inference and category-aware refinement.
Our dataset and code will be released to the community.
arXiv Detail & Related papers (2023-12-10T08:25:41Z) - QuestSim: Human Motion Tracking from Sparse Sensors with Simulated
Avatars [80.05743236282564]
Real-time tracking of human body motion is crucial for immersive experiences in AR/VR.
We present a reinforcement learning framework that takes in sparse signals from an HMD and two controllers.
We show that a single policy can be robust to diverse locomotion styles, different body sizes, and novel environments.
arXiv Detail & Related papers (2022-09-20T00:25:54Z) - HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling [83.57675975092496]
HuMMan is a large-scale multi-modal 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.
HuMMan has several appealing properties: 1) multi-modal data and annotations including color images, point clouds, keypoints, SMPL parameters, and textured meshes.
arXiv Detail & Related papers (2022-04-28T17:54:25Z) - Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal
Attention [0.9668407688201357]
We propose a new task of sensor-augmented egocentric-video captioning.
We use wearable-sensor data as auxiliary information to mitigate the inherent problems in egocentric vision.
arXiv Detail & Related papers (2021-09-07T09:22:09Z) - ChallenCap: Monocular 3D Capture of Challenging Human Performances using
Multi-Modal References [18.327101908143113]
We propose ChallenCap -- a template-based approach to capture challenging 3D human motions using a single RGB camera.
We adopt a novel learning-and-optimization framework, with the aid of multi-modal references.
Experiments on our new challenging motion dataset demonstrate the effectiveness and robustness of our approach to capture challenging human motions.
arXiv Detail & Related papers (2021-03-11T15:49:22Z) - Asynchronous Multi-View SLAM [78.49842639404413]
Existing multi-camera SLAM systems assume synchronized shutters for all cameras, which is often not the case in practice.
Our framework integrates a continuous-time motion model to relate information across asynchronous multi-frames during tracking, local mapping, and loop closing.
arXiv Detail & Related papers (2021-01-17T00:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.