Tracking Any Point with Frame-Event Fusion Network at High Frame Rate
- URL: http://arxiv.org/abs/2409.11953v1
- Date: Wed, 18 Sep 2024 13:07:19 GMT
- Title: Tracking Any Point with Frame-Event Fusion Network at High Frame Rate
- Authors: Jiaxiong Liu, Bo Wang, Zhen Tan, Jinpu Zhang, Hui Shen, Dewen Hu,
- Abstract summary: We propose an image-event fusion point tracker, FE-TAP.
It combines the contextual information from image frames with the high temporal resolution of events.
FE-TAP achieves high frame rate and robust point tracking under various challenging conditions.
- Score: 16.749590397918574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tracking any point based on image frames is constrained by frame rates, leading to instability in high-speed scenarios and limited generalization in real-world applications. To overcome these limitations, we propose an image-event fusion point tracker, FE-TAP, which combines the contextual information from image frames with the high temporal resolution of events, achieving high frame rate and robust point tracking under various challenging conditions. Specifically, we designed an Evolution Fusion module (EvoFusion) to model the image generation process guided by events. This module can effectively integrate valuable information from both modalities operating at different frequencies. To achieve smoother point trajectories, we employed a transformer-based refinement strategy that updates the point's trajectories and features iteratively. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches, particularly improving expected feature age by 24$\%$ on EDS datasets. Finally, we qualitatively validated the robustness of our algorithm in real driving scenarios using our custom-designed high-resolution image-event synchronization device. Our source code will be released at https://github.com/ljx1002/FE-TAP.
Related papers
- Event-Based Video Frame Interpolation With Cross-Modal Asymmetric Bidirectional Motion Fields [39.214857326425204]
Video Frame Interpolation (VFI) aims to generate intermediate video frames between consecutive input frames.
We propose a novel event-based VFI framework with cross-modal asymmetric bidirectional motion field estimation.
Our method shows significant performance improvement over the state-of-the-art VFI methods on various datasets.
arXiv Detail & Related papers (2025-02-19T13:40:43Z) - Spatially-guided Temporal Aggregation for Robust Event-RGB Optical Flow Estimation [47.75348821902489]
Current optical flow methods exploit the stable appearance of frame (or RGB) data to establish robust correspondences across time.
Event cameras, on the other hand, provide high-temporal-resolution motion cues and excel in challenging scenarios.
This study introduces a novel approach that uses a spatially dense modality to guide the aggregation of the temporally dense event modality.
arXiv Detail & Related papers (2025-01-01T13:40:09Z) - BlinkTrack: Feature Tracking over 100 FPS via Events and Images [50.98675227695814]
We propose a novel framework, BlinkTrack, which integrates event data with RGB images for high-frequency feature tracking.
Our method extends the traditional Kalman filter into a learning-based framework, utilizing differentiable Kalman filters in both event and image branches.
Experimental results indicate that BlinkTrack significantly outperforms existing event-based methods.
arXiv Detail & Related papers (2024-09-26T15:54:18Z) - Alignment-free HDR Deghosting with Semantics Consistent Transformer [76.91669741684173]
High dynamic range imaging aims to retrieve information from multiple low-dynamic range inputs to generate realistic output.
Existing methods often focus on the spatial misalignment across input frames caused by the foreground and/or camera motion.
We propose a novel alignment-free network with a Semantics Consistent Transformer (SCTNet) with both spatial and channel attention modules.
arXiv Detail & Related papers (2023-05-29T15:03:23Z) - Frame-Event Alignment and Fusion Network for High Frame Rate Tracking [37.35823883499189]
Most existing RGB-based trackers target low frame rate benchmarks of around 30 frames per second.
We propose an end-to-end network consisting of multi-modality alignment and fusion modules.
With the FE240hz dataset, our approach achieves high frame rate tracking up to 240Hz.
arXiv Detail & Related papers (2023-05-25T03:34:24Z) - A Unified Framework for Event-based Frame Interpolation with Ad-hoc Deblurring in the Wild [72.0226493284814]
We propose a unified framework for event-based frame that performs deblurring ad-hoc.
Our network consistently outperforms previous state-of-the-art methods on frame, single image deblurring, and the joint task of both.
arXiv Detail & Related papers (2023-01-12T18:19:00Z) - Towards Frame Rate Agnostic Multi-Object Tracking [76.82407173177138]
We propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time.
Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information.
FAPS reflects all post-processing steps in training via tracking pattern matching and fusion.
arXiv Detail & Related papers (2022-09-23T04:25:19Z) - Asynchronous Optimisation for Event-based Visual Odometry [53.59879499700895]
Event cameras open up new possibilities for robotic perception due to their low latency and high dynamic range.
We focus on event-based visual odometry (VO)
We propose an asynchronous structure-from-motion optimisation back-end.
arXiv Detail & Related papers (2022-03-02T11:28:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.