Related papers: EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion

EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion

URL: http://arxiv.org/abs/2312.16933v1
Date: Thu, 28 Dec 2023 10:05:13 GMT
Title: EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion
Authors: Jianping Jiang, Xinyu Zhou, Peiqi Duan, Boxin Shi
Abstract summary: EvPlug learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model. We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation.
Score: 55.367269556557645
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Event cameras and RGB cameras exhibit complementary characteristics in imaging: the former possesses high dynamic range (HDR) and high temporal resolution, while the latter provides rich texture and color information. This makes the integration of event cameras into middle- and high-level RGB-based vision tasks highly promising. However, challenges arise in multi-modal fusion, data annotation, and model architecture design. In this paper, we propose EvPlug, which learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model. The learned fusion module integrates event streams with image features in the form of a plug-in, endowing the RGB-based model to be robust to HDR and fast motion scenes while enabling high temporal resolution inference. Our method only requires unlabeled event-image pairs (no pixel-wise alignment required) and does not alter the structure or weights of the RGB-based model. We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation

Related papers

CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework [30.734382771657312]
We propose a novel CM3AE pre-training framework for the RGB-Event perception. This framework accepts multi-modalities/views of data as input, including RGB images, event images, and event voxels. We construct a large-scale dataset containing 2,535,759 RGB-Event data pairs for the pre-training.
arXiv Detail & Related papers (2025-04-17T01:49:46Z)
LapGSR: Laplacian Reconstructive Network for Guided Thermal Super-Resolution [1.747623282473278]
Fusing multiple modalities to produce high-resolution images often requires dense models with millions of parameters and a heavy computational load. We propose LapGSR, a multimodal, lightweight, generative model incorporating Laplacian image pyramids for guided thermal super-resolution.
arXiv Detail & Related papers (2024-11-12T12:23:19Z)
ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection [51.16181295385818]
We first collect an annotated RGB-D video SODOD (DSOD-100) dataset, which contains 100 videos within a total of 9,362 frames. All the frames in each video are manually annotated to a high-quality saliency annotation. We propose a new baseline model, named attentive triple-fusion network (ATF-Net) for RGB-D salient object detection.
arXiv Detail & Related papers (2024-06-18T12:09:43Z)
Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction [51.87279764576998]
We propose EvRGBHand -- the first approach for 3D hand mesh reconstruction with an event camera and an RGB camera compensating for each other. EvRGBHand can tackle overexposure and motion blur issues in RGB-based HMR and foreground scarcity and background overflow issues in event-based HMR.
arXiv Detail & Related papers (2024-03-12T06:04:50Z)
FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything [1.5728609542259502]
This paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation.
arXiv Detail & Related papers (2024-02-29T22:59:27Z)
CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras [43.699819213559515]
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and their resolution ($346 times 260$) is low for practical applications. We build the first unaligned frame-event dataset CRSOT collected with a specially built data acquisition system. We propose a novel unaligned object tracking framework that can realize robust tracking even using the loosely aligned RGB-Event data.
arXiv Detail & Related papers (2024-01-05T14:20:22Z)
Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair. In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD. Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z)
End-to-end Multi-modal Video Temporal Grounding [105.36814858748285]
We propose a multi-modal framework to extract complementary information from videos. We adopt RGB images for appearance, optical flow for motion, and depth maps for image structure. We conduct experiments on the Charades-STA and ActivityNet Captions datasets, and show that the proposed method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2021-07-12T17:58:10Z)
Self-Supervised Representation Learning for RGB-D Salient Object Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation. Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts. For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z)
Matching Neuromorphic Events and Color Images via Adversarial Learning [49.447580124957966]
We propose the Event-Based Image Retrieval (EBIR) problem to exploit the cross-modal matching task. We address the EBIR problem by proposing neuromorphic Events-Color image Feature Learning (ECFL) We also contribute to the community N-UKbench and EC180 dataset to promote the development of EBIR problem.
arXiv Detail & Related papers (2020-03-02T02:48:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.