Object Tracking by Jointly Exploiting Frame and Event Domain
- URL: http://arxiv.org/abs/2109.09052v1
- Date: Sun, 19 Sep 2021 03:13:25 GMT
- Title: Object Tracking by Jointly Exploiting Frame and Event Domain
- Authors: Jiqing Zhang and Xin Yang and Yingkai Fu and Xiaopeng Wei and Baocai
Yin and Bo Dong
- Abstract summary: We propose a multi-modal based approach to fuse visual cues from the frame- and event-domain to enhance single object tracking performance.
The proposed approach can effectively and adaptively combine meaningful information from both domains.
We show that the proposed approach outperforms state-of-the-art frame-based tracking methods by at least 10.4% and 11.9% in terms of representative success rate and precision rate.
- Score: 31.534731963279274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by the complementarity between conventional frame-based and
bio-inspired event-based cameras, we propose a multi-modal based approach to
fuse visual cues from the frame- and event-domain to enhance the single object
tracking performance, especially in degraded conditions (e.g., scenes with high
dynamic range, low light, and fast-motion objects). The proposed approach can
effectively and adaptively combine meaningful information from both domains.
Our approach's effectiveness is enforced by a novel designed cross-domain
attention schemes, which can effectively enhance features based on self- and
cross-domain attention schemes; The adaptiveness is guarded by a specially
designed weighting scheme, which can adaptively balance the contribution of the
two domains. To exploit event-based visual cues in single-object tracking, we
construct a large-scale frame-event-based dataset, which we subsequently employ
to train a novel frame-event fusion based model. Extensive experiments show
that the proposed approach outperforms state-of-the-art frame-based tracking
methods by at least 10.4% and 11.9% in terms of representative success rate and
precision rate, respectively. Besides, the effectiveness of each key component
of our approach is evidenced by our thorough ablation study.
Related papers
- Relating Events and Frames Based on Self-Supervised Learning and
Uncorrelated Conditioning for Unsupervised Domain Adaptation [23.871860648919593]
Event-based cameras provide accurate and high temporal resolution measurements for performing computer vision tasks.
Despite their advantages, utilizing deep learning for event-based vision encounters a significant obstacle due to the scarcity of annotated data.
We propose a new algorithm tailored for adapting a deep neural network trained on annotated frame-based data to generalize well on event-based unannotated data.
arXiv Detail & Related papers (2024-01-02T05:10:08Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Generalizing Event-Based Motion Deblurring in Real-World Scenarios [62.995994797897424]
Event-based motion deblurring has shown promising results by exploiting low-latency events.
We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur.
A two-stage self-supervised learning scheme is then developed to fit real-world data distribution.
arXiv Detail & Related papers (2023-08-11T04:27:29Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Towards Discriminative Representation: Multi-view Trajectory Contrastive
Learning for Online Multi-object Tracking [1.0474108328884806]
We propose a strategy, namely multi-view trajectory contrastive learning, in which each trajectory is represented as a center vector.
In the inference stage, a similarity-guided feature fusion strategy is developed for further boosting the quality of the trajectory representation.
Our method has surpassed preceding trackers and established new state-of-the-art performance.
arXiv Detail & Related papers (2022-03-27T04:53:31Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z) - Weakly supervised cross-domain alignment with optimal transport [102.8572398001639]
Cross-domain alignment between image objects and text sequences is key to many visual-language tasks.
This paper investigates a novel approach for the identification and optimization of fine-grained semantic similarities between image and text entities.
arXiv Detail & Related papers (2020-08-14T22:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.