WCCNet: Wavelet-integrated CNN with Crossmodal Rearranging Fusion for
Fast Multispectral Pedestrian Detection
- URL: http://arxiv.org/abs/2308.01042v1
- Date: Wed, 2 Aug 2023 09:35:21 GMT
- Title: WCCNet: Wavelet-integrated CNN with Crossmodal Rearranging Fusion for
Fast Multispectral Pedestrian Detection
- Authors: Xingjian Wang, Li Chai, Jiming Chen, Zhiguo Shi
- Abstract summary: We propose a novel framework named WCCNet that is able to differentially extract rich features of different spectra with lower computational complexity.
Based on the well extracted features, we elaborately design the crossmodal rearranging fusion module (CMRF)
We conduct comprehensive evaluations on KAIST and FLIR benchmarks, in which WCCNet outperforms state-of-the-art methods with considerable computational efficiency and competitive accuracy.
- Score: 16.43119521684829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multispectral pedestrian detection achieves better visibility in challenging
conditions and thus has a broad application in various tasks, for which both
the accuracy and computational cost are of paramount importance. Most existing
approaches treat RGB and infrared modalities equally, typically adopting two
symmetrical CNN backbones for multimodal feature extraction, which ignores the
substantial differences between modalities and brings great difficulty for the
reduction of the computational cost as well as effective crossmodal fusion. In
this work, we propose a novel and efficient framework named WCCNet that is able
to differentially extract rich features of different spectra with lower
computational complexity and semantically rearranges these features for
effective crossmodal fusion. Specifically, the discrete wavelet transform (DWT)
allowing fast inference and training speed is embedded to construct a
dual-stream backbone for efficient feature extraction. The DWT layers of WCCNet
extract frequency components for infrared modality, while the CNN layers
extract spatial-domain features for RGB modality. This methodology not only
significantly reduces the computational complexity, but also improves the
extraction of infrared features to facilitate the subsequent crossmodal fusion.
Based on the well extracted features, we elaborately design the crossmodal
rearranging fusion module (CMRF), which can mitigate spatial misalignment and
merge semantically complementary features of spatially-related local regions to
amplify the crossmodal complementary information. We conduct comprehensive
evaluations on KAIST and FLIR benchmarks, in which WCCNet outperforms
state-of-the-art methods with considerable computational efficiency and
competitive accuracy. We also perform the ablation study and analyze thoroughly
the impact of different components on the performance of WCCNet.
Related papers
- CDXFormer: Boosting Remote Sensing Change Detection with Extended Long Short-Term Memory [3.119836924407993]
We propose CDXFormer, with a core component that is a powerful XLSTM-based spatial enhancement layer.
We introduce a scale-specific Feature Enhancer layer, incorporating a Cross-Temporal Global Perceptron customized for semantic-accurate deep features.
We also propose a Cross-Scale Interactive Fusion module to progressively interact global change representations with responses.
arXiv Detail & Related papers (2024-11-12T15:22:14Z) - CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes [0.0]
multimodal semantic segmentation methods suffer from high computational complexity and low inference speed.
We propose the Cosine Similarity Fusion Network (CSFNet) as a real-time RGB-X semantic segmentation model.
CSFNet has competitive accuracy with state-of-the-art methods while being state-of-the-art in terms of speed.
arXiv Detail & Related papers (2024-07-01T14:34:32Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Point-aware Interaction and CNN-induced Refinement Network for RGB-D
Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement.
In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Spatio-channel Attention Blocks for Cross-modal Crowd Counting [3.441021278275805]
Cross-modal Spatio-Channel Attention (CSCA) blocks can be easily integrated into any modality-specific architecture.
In our experiments, the proposed block consistently shows significant performance improvement across various backbone networks.
arXiv Detail & Related papers (2022-10-19T09:05:00Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared
Person Re-Identification [35.97494894205023]
RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality.
Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space.
We present a novel multi-feature space joint optimization (MSO) network, which can learn modality-sharable features in both the single-modality space and the common space.
arXiv Detail & Related papers (2021-10-21T16:45:23Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Multi-level Cross-modal Interaction Network for RGB-D Salient Object
Detection [3.581367375462018]
We propose a novel Multi-level Cross-modal Interaction Network (MCINet) for RGB-D based salient object detection (SOD)
Our MCI-Net includes two key components: 1) a cross-modal feature learning network, which is used to learn the high-level features for the RGB images and depth cues, effectively enabling the correlations between the two sources to be exploited; and 2) a multi-level interactive integration network, which integrates multi-level cross-modal features to boost the SOD performance.
arXiv Detail & Related papers (2020-07-10T02:21:02Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.