MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection
- URL: http://arxiv.org/abs/2505.11282v2
- Date: Mon, 02 Jun 2025 18:59:15 GMT
- Title: MTevent: A Multi-Task Event Camera Dataset for 6D Pose Estimation and Moving Object Detection
- Authors: Shrutarv Awasthi, Anas Gouda, Sven Franke, Jérôme Rutinowski, Frank Hoffmann, Moritz Roidl,
- Abstract summary: MTevent is a dataset designed for 6D pose estimation and moving object detection in highly dynamic environments.<n>Our setup consists of a stereo-event camera and an RGB camera, capturing 75 scenes, each on average 16 seconds.<n>We evaluate the task of 6D pose estimation using NVIDIA's FoundationPose on RGB images, achieving an Average Recall of 0.22 with ground-truth masks.
- Score: 1.1083289076967895
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mobile robots are reaching unprecedented speeds, with platforms like Unitree B2, and Fraunhofer O3dyn achieving maximum speeds between 5 and 10 m/s. However, effectively utilizing such speeds remains a challenge due to the limitations of RGB cameras, which suffer from motion blur and fail to provide real-time responsiveness. Event cameras, with their asynchronous operation, and low-latency sensing, offer a promising alternative for high-speed robotic perception. In this work, we introduce MTevent, a dataset designed for 6D pose estimation and moving object detection in highly dynamic environments with large detection distances. Our setup consists of a stereo-event camera and an RGB camera, capturing 75 scenes, each on average 16 seconds, and featuring 16 unique objects under challenging conditions such as extreme viewing angles, varying lighting, and occlusions. MTevent is the first dataset to combine high-speed motion, long-range perception, and real-world object interactions, making it a valuable resource for advancing event-based vision in robotics. To establish a baseline, we evaluate the task of 6D pose estimation using NVIDIA's FoundationPose on RGB images, achieving an Average Recall of 0.22 with ground-truth masks, highlighting the limitations of RGB-based approaches in such dynamic settings. With MTevent, we provide a novel resource to improve perception models and foster further research in high-speed robotic vision. The dataset is available for download https://huggingface.co/datasets/anas-gouda/MTevent
Related papers
- FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications.<n>Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings.<n>We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z) - EventEgo3D++: 3D Human Motion Capture from a Head-Mounted Event Camera [64.58147600753382]
EventEgo3D++ is a monocular event camera with a fisheye lens for 3D human motion capture.<n>Event cameras excel in high-speed scenarios and varying illumination due to their high temporal resolution.<n>Our method supports real-time 3D pose updates at a rate of 140Hz.
arXiv Detail & Related papers (2025-02-11T18:57:05Z) - TUMTraf Event: Calibration and Fusion Resulting in a Dataset for
Roadside Event-Based and RGB Cameras [14.57694345706197]
Event-based cameras are predestined for Intelligent Transportation Systems (ITS)
They provide very high temporal resolution and dynamic range, which can eliminate motion blur and improve detection performance at night.
However, event-based images lack color and texture compared to images from a conventional RGB camera.
arXiv Detail & Related papers (2024-01-16T16:25:37Z) - EventTransAct: A video transformer-based framework for Event-camera
based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos.
In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame.
In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z) - Event Camera-based Visual Odometry for Dynamic Motion Tracking of a
Legged Robot Using Adaptive Time Surface [5.341864681049579]
Event cameras offer high temporal resolution and dynamic range, which can eliminate the issue of blurred RGB images during fast movements.
We introduce an adaptive time surface (ATS) method that addresses the whiteout and blackout issue in conventional time surfaces.
Lastly, we propose a nonlinear pose optimization formula that simultaneously performs 3D-2D alignment on both RGB-based and event-based maps and images.
arXiv Detail & Related papers (2023-05-15T19:03:45Z) - Recurrent Vision Transformers for Object Detection with Event Cameras [62.27246562304705]
We present Recurrent Vision Transformers (RVTs), a novel backbone for object detection with event cameras.
RVTs can be trained from scratch to reach state-of-the-art performance on event-based object detection.
Our study brings new insights into effective design choices that can be fruitful for research beyond event-based vision.
arXiv Detail & Related papers (2022-12-11T20:28:59Z) - ROFT: Real-Time Optical Flow-Aided 6D Object Pose and Velocity Tracking [7.617467911329272]
We introduce ROFT, a Kalman filtering approach for 6D object pose and velocity tracking from a stream of RGB-D images.
By leveraging real-time optical flow, ROFT synchronizes delayed outputs of low frame rate Convolutional Neural Networks for instance segmentation and 6D object pose estimation.
Results demonstrate that our approach outperforms state-of-the-art methods for 6D object pose tracking, while also providing 6D object velocity tracking.
arXiv Detail & Related papers (2021-11-06T07:30:00Z) - TUM-VIE: The TUM Stereo Visual-Inertial Event Dataset [50.8779574716494]
Event cameras are bio-inspired vision sensors which measure per pixel brightness changes.
They offer numerous benefits over traditional, frame-based cameras, including low latency, high dynamic range, high temporal resolution and low power consumption.
To foster the development of 3D perception and navigation algorithms with event cameras, we present the TUM-VIE dataset.
arXiv Detail & Related papers (2021-08-16T19:53:56Z) - Real-time RGBD-based Extended Body Pose Estimation [57.61868412206493]
We present a system for real-time RGBD-based estimation of 3D human pose.
We use parametric 3D deformable human mesh model (SMPL-X) as a representation.
We train estimators of body pose and facial expression parameters.
arXiv Detail & Related papers (2021-03-05T13:37:50Z) - 0-MMS: Zero-Shot Multi-Motion Segmentation With A Monocular Event Camera [13.39518293550118]
We present an approach for monocular multi-motion segmentation, which combines bottom-up feature tracking and top-down motion compensation into a unified pipeline.
Using the events within a time-interval, our method segments the scene into multiple motions by splitting and merging.
The approach was successfully evaluated on both challenging real-world and synthetic scenarios from the EV-IMO, EED, and MOD datasets.
arXiv Detail & Related papers (2020-06-11T02:34:29Z) - RGB-D-E: Event Camera Calibration for Fast 6-DOF Object Tracking [16.06615504110132]
We propose to use an event-based camera to increase the speed of 3D object tracking in 6 degrees of freedom.
This application requires handling very high object speed to convey compelling AR experiences.
We develop a deep learning approach, which combines an existing RGB-D network along with a novel event-based network in a cascade fashion.
arXiv Detail & Related papers (2020-06-09T01:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.