Tracking Fast by Learning Slow: An Event-based Speed Adaptive Hand
Tracker Leveraging Knowledge in RGB Domain
- URL: http://arxiv.org/abs/2302.14430v1
- Date: Tue, 28 Feb 2023 09:18:48 GMT
- Title: Tracking Fast by Learning Slow: An Event-based Speed Adaptive Hand
Tracker Leveraging Knowledge in RGB Domain
- Authors: Chuanlin Lan, Ziyuan Yin, Arindam Basu, Rosa H. M. Chan
- Abstract summary: 3D hand tracking methods based on monocular RGB videos are easily affected by motion blur, while event camera, a sensor with high temporal resolution and dynamic range, is naturally suitable for this task with sparse output and low power consumption.
We developed an event-based speed adaptive hand tracker (ESAHT) to solve the hand tracking problem based on event camera.
Our solution outperformed RGB-based as well as previous event-based solutions in fast hand tracking tasks, and our codes and dataset will be publicly available.
- Score: 4.530678016396477
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D hand tracking methods based on monocular RGB videos are easily affected by
motion blur, while event camera, a sensor with high temporal resolution and
dynamic range, is naturally suitable for this task with sparse output and low
power consumption. However, obtaining 3D annotations of fast-moving hands is
difficult for constructing event-based hand-tracking datasets. In this paper,
we provided an event-based speed adaptive hand tracker (ESAHT) to solve the
hand tracking problem based on event camera. We enabled a CNN model trained on
a hand tracking dataset with slow motion, which enabled the model to leverage
the knowledge of RGB-based hand tracking solutions, to work on fast hand
tracking tasks. To realize our solution, we constructed the first 3D hand
tracking dataset captured by an event camera in a real-world environment,
figured out two data augment methods to narrow the domain gap between slow and
fast motion data, developed a speed adaptive event stream segmentation method
to handle hand movements in different moving speeds, and introduced a new
event-to-frame representation method adaptive to event streams with different
lengths. Experiments showed that our solution outperformed RGB-based as well as
previous event-based solutions in fast hand tracking tasks, and our codes and
dataset will be publicly available.
Related papers
- CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event
Cameras [43.699819213559515]
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution ($346 times 260$) is low for practical applications.
We build the first unaligned frame-event dataset CRSOT collected with a specially built data acquisition system.
We propose a novel unaligned object tracking framework that can realize robust tracking even using the loosely aligned RGB-Event data.
arXiv Detail & Related papers (2024-01-05T14:20:22Z) - 3D Pose Estimation of Two Interacting Hands from a Monocular Event
Camera [59.846927201816776]
This paper introduces the first framework for 3D tracking of two fast-moving and interacting hands from a single monocular event camera.
Our approach tackles the left-right hand ambiguity with a novel semi-supervised feature-wise attention mechanism and integrates an intersection loss to fix hand collisions.
arXiv Detail & Related papers (2023-12-21T18:59:57Z) - Implicit Event-RGBD Neural SLAM [54.74363487009845]
Implicit neural SLAM has achieved remarkable progress recently.
Existing methods face significant challenges in non-ideal scenarios.
We propose EN-SLAM, the first event-RGBD implicit neural SLAM framework.
arXiv Detail & Related papers (2023-11-18T08:48:58Z) - Event Camera-based Visual Odometry for Dynamic Motion Tracking of a
Legged Robot Using Adaptive Time Surface [5.341864681049579]
Event cameras offer high temporal resolution and dynamic range, which can eliminate the issue of blurred RGB images during fast movements.
We introduce an adaptive time surface (ATS) method that addresses the whiteout and blackout issue in conventional time surfaces.
Lastly, we propose a nonlinear pose optimization formula that simultaneously performs 3D-2D alignment on both RGB-based and event-based maps and images.
arXiv Detail & Related papers (2023-05-15T19:03:45Z) - Event-based tracking of human hands [0.6875312133832077]
Event camera detects changes in brightness, measuring motion, with low latency, no motion blur, low power consumption and high dynamic range.
Captured frames are analysed using lightweight algorithms reporting 3D hand position data.
arXiv Detail & Related papers (2023-04-13T13:43:45Z) - VisEvent: Reliable Object Tracking via Collaboration of Frame and Event
Flows [93.54888104118822]
We propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task.
Our dataset consists of 820 video pairs captured under low illumination, high speed, and background clutter scenarios.
Based on VisEvent, we transform the event flows into event images and construct more than 30 baseline methods.
arXiv Detail & Related papers (2021-08-11T03:55:12Z) - RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB
Video [76.86512780916827]
We present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera.
In order to address the inherent depth ambiguities in RGB data, we propose a novel multi-task CNN.
We experimentally verify the individual components of our RGB two-hand tracking and 3D reconstruction pipeline.
arXiv Detail & Related papers (2021-06-22T12:53:56Z) - Differentiable Event Stream Simulator for Non-Rigid 3D Tracking [82.56690776283428]
Our differentiable simulator enables non-rigid 3D tracking of deformable objects from event streams.
We show the effectiveness of our approach for various types of non-rigid objects and compare to existing methods for non-rigid 3D tracking.
arXiv Detail & Related papers (2021-04-30T17:58:07Z) - EventHands: Real-Time Neural 3D Hand Reconstruction from an Event Stream [80.15360180192175]
3D hand pose estimation from monocular videos is a long-standing and challenging problem.
We address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes.
Our approach has characteristics previously not demonstrated with a single RGB or depth camera.
arXiv Detail & Related papers (2020-12-11T16:45:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.