EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion
- URL: http://arxiv.org/abs/2312.16933v1
- Date: Thu, 28 Dec 2023 10:05:13 GMT
- Title: EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion
- Authors: Jianping Jiang, Xinyu Zhou, Peiqi Duan, Boxin Shi
- Abstract summary: EvPlug learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model.
We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation.
- Score: 55.367269556557645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Event cameras and RGB cameras exhibit complementary characteristics in
imaging: the former possesses high dynamic range (HDR) and high temporal
resolution, while the latter provides rich texture and color information. This
makes the integration of event cameras into middle- and high-level RGB-based
vision tasks highly promising. However, challenges arise in multi-modal fusion,
data annotation, and model architecture design. In this paper, we propose
EvPlug, which learns a plug-and-play event and image fusion module from the
supervision of the existing RGB-based model. The learned fusion module
integrates event streams with image features in the form of a plug-in, endowing
the RGB-based model to be robust to HDR and fast motion scenes while enabling
high temporal resolution inference. Our method only requires unlabeled
event-image pairs (no pixel-wise alignment required) and does not alter the
structure or weights of the RGB-based model. We demonstrate the superiority of
EvPlug in several vision tasks such as object detection, semantic segmentation,
and 3D hand pose estimation
Related papers
- Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection [51.16181295385818]
We first collect an annotated RGB-D video SODOD (DSOD-100) dataset, which contains 100 videos within a total of 9,362 frames.
All the frames in each video are manually annotated to a high-quality saliency annotation.
We propose a new baseline model, named attentive triple-fusion network (ATF-Net) for RGB-D salient object detection.
arXiv Detail & Related papers (2024-06-18T12:09:43Z) - Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction [51.87279764576998]
We propose EvRGBHand -- the first approach for 3D hand mesh reconstruction with an event camera and an RGB camera compensating for each other.
EvRGBHand can tackle overexposure and motion blur issues in RGB-based HMR and foreground scarcity and background overflow issues in event-based HMR.
arXiv Detail & Related papers (2024-03-12T06:04:50Z) - FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything [1.5728609542259502]
This paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery.
The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain.
The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation.
arXiv Detail & Related papers (2024-02-29T22:59:27Z) - CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event
Cameras [43.699819213559515]
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution ($346 times 260$) is low for practical applications.
We build the first unaligned frame-event dataset CRSOT collected with a specially built data acquisition system.
We propose a novel unaligned object tracking framework that can realize robust tracking even using the loosely aligned RGB-Event data.
arXiv Detail & Related papers (2024-01-05T14:20:22Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - End-to-end Multi-modal Video Temporal Grounding [105.36814858748285]
We propose a multi-modal framework to extract complementary information from videos.
We adopt RGB images for appearance, optical flow for motion, and depth maps for image structure.
We conduct experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-12T17:58:10Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Matching Neuromorphic Events and Color Images via Adversarial Learning [49.447580124957966]
We propose the Event-Based Image Retrieval (EBIR) problem to exploit the cross-modal matching task.
We address the EBIR problem by proposing neuromorphic Events-Color image Feature Learning (ECFL)
We also contribute to the community N-UKbench and EC180 dataset to promote the development of EBIR problem.
arXiv Detail & Related papers (2020-03-02T02:48:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.