Erasure-based Interaction Network for RGBT Video Object Detection and A
Unified Benchmark
- URL: http://arxiv.org/abs/2308.01630v1
- Date: Thu, 3 Aug 2023 09:04:48 GMT
- Title: Erasure-based Interaction Network for RGBT Video Object Detection and A
Unified Benchmark
- Authors: Zhengzheng Tu, Qishun Wang, Hongshun Wang, Kunpeng Wang, Chenglong Li
- Abstract summary: This work introduces a new computer vision task called RGB-thermal (RGBT) VOD.
Traditional Video Object Detection (VOD) methods often leverage temporal information.
We develop a negative activation function that is used to erase the noise of RGB features with the help of thermal image features.
- Score: 9.979933455242774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, many breakthroughs are made in the field of Video Object Detection
(VOD), but the performance is still limited due to the imaging limitations of
RGB sensors in adverse illumination conditions. To alleviate this issue, this
work introduces a new computer vision task called RGB-thermal (RGBT) VOD by
introducing the thermal modality that is insensitive to adverse illumination
conditions. To promote the research and development of RGBT VOD, we design a
novel Erasure-based Interaction Network (EINet) and establish a comprehensive
benchmark dataset (VT-VOD50) for this task. Traditional VOD methods often
leverage temporal information by using many auxiliary frames, and thus have
large computational burden. Considering that thermal images exhibit less noise
than RGB ones, we develop a negative activation function that is used to erase
the noise of RGB features with the help of thermal image features. Furthermore,
with the benefits from thermal images, we rely only on a small temporal window
to model the spatio-temporal information to greatly improve efficiency while
maintaining detection accuracy.
VT-VOD50 dataset consists of 50 pairs of challenging RGBT video sequences
with complex backgrounds, various objects and different illuminations, which
are collected in real traffic scenarios. Extensive experiments on VT-VOD50
dataset demonstrate the effectiveness and efficiency of our proposed method
against existing mainstream VOD methods. The code of EINet and the dataset will
be released to the public for free academic usage.
Related papers
- SPACT18: Spiking Human Action Recognition Benchmark Dataset with Complementary RGB and Thermal Modalities [14.157338282165037]
Spike cameras, bio-inspired vision sensors, asynchronously fire by accumulating light intensities at each pixel, offering exceptional resolution spikes.<n>This work contributes a dataset that will drive research in energy-efficient, ultra-low-power video understanding, specifically for action recognition using spike-based data.
arXiv Detail & Related papers (2025-07-22T01:59:14Z) - Multimodal Spatio-temporal Graph Learning for Alignment-free RGBT Video Object Detection [13.682115079677466]
RGB-Thermal Video Object Detection (RGBT VOD) can address the limitation of traditional RGB-based VOD in challenging lighting conditions.
We propose a novel Multimodal Spatio-temporal Graph learning Network (MSGNet) for alignment-free RGBT VOD problem.
arXiv Detail & Related papers (2025-04-16T05:32:59Z) - Human Activity Recognition using RGB-Event based Sensors: A Multi-modal Heat Conduction Model and A Benchmark Dataset [65.76480665062363]
Human Activity Recognition primarily relied on traditional RGB cameras to achieve high-performance activity recognition.
Challenges in real-world scenarios, such as insufficient lighting and rapid movements, inevitably degrade the performance of RGB cameras.
In this work, we rethink human activity recognition by combining the RGB and event cameras.
arXiv Detail & Related papers (2025-04-08T09:14:24Z) - Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasets [5.069884983892437]
We propose a simple yet effective multi-modal (RGB and depth) training framework called SurgDepth.
We show state-of-the-art (SOTA) results on all publicly available datasets applicable for this task.
We conduct extensive experiments on benchmark datasets including EndoVis2022, AutoLapro, LapI2I and EndoVis 2017.
arXiv Detail & Related papers (2024-07-29T05:35:51Z) - BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions.
We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels.
Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z) - Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera [8.673063170884591]
EOLO is a novel object detection framework that achieves robust and efficient all-day detection by fusing both RGB and event modalities.
Our EOLO framework is built based on a lightweight spiking neural network (SNN) to efficiently leverage the asynchronous property of events.
arXiv Detail & Related papers (2023-09-17T15:14:01Z) - Attentive Multimodal Fusion for Optical and Scene Flow [24.08052492109655]
Existing methods typically rely solely on RGB images or fuse the modalities at later stages.
We propose a novel deep neural network approach named FusionRAFT, which enables early-stage information fusion between sensor modalities.
Our approach exhibits improved robustness in the presence of noise and low-lighting conditions that affect the RGB images.
arXiv Detail & Related papers (2023-07-28T04:36:07Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - PVDD: A Practical Video Denoising Dataset with Real-World Dynamic Scenes [56.4361151691284]
"Practical Video Denoising dataset" (PVDD) contains 200 noisy-clean dynamic video pairs in both sRGB and RAW format.
Compared with existing datasets consisting of limited motion information,PVDD covers dynamic scenes with varying natural motion.
arXiv Detail & Related papers (2022-07-04T12:30:22Z) - Glass Segmentation with RGB-Thermal Image Pairs [16.925196782387857]
We propose a new glass segmentation method utilizing paired RGB and thermal images.
Glass regions of a scene are made more distinguishable with a pair of RGB and thermal images than solely with an RGB image.
arXiv Detail & Related papers (2022-04-12T00:20:22Z) - Meta-UDA: Unsupervised Domain Adaptive Thermal Object Detection using
Meta-Learning [64.92447072894055]
Infrared (IR) cameras are robust under adverse illumination and lighting conditions.
We propose an algorithm meta-learning framework to improve existing UDA methods.
We produce a state-of-the-art thermal detector for the KAIST and DSIAC datasets.
arXiv Detail & Related papers (2021-10-07T02:28:18Z) - Energy-Efficient Model Compression and Splitting for Collaborative
Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes.
Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z) - MobileSal: Extremely Efficient RGB-D Salient Object Detection [62.04876251927581]
This paper introduces a novel network, methodname, which focuses on efficient RGB-D salient object detection (SOD)
We propose an implicit depth restoration (IDR) technique to strengthen the feature representation capability of mobile networks for RGB-D SOD.
With IDR and CPR incorporated, methodnameperforms favorably against sArt methods on seven challenging RGB-D SOD datasets.
arXiv Detail & Related papers (2020-12-24T04:36:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.