TAFNet: A Three-Stream Adaptive Fusion Network for RGB-T Crowd Counting
- URL: http://arxiv.org/abs/2202.08517v1
- Date: Thu, 17 Feb 2022 08:43:10 GMT
- Title: TAFNet: A Three-Stream Adaptive Fusion Network for RGB-T Crowd Counting
- Authors: Haihan Tang, Yi Wang, Lap-Pui Chau
- Abstract summary: We propose a three-stream adaptive fusion network named TAFNet, which uses paired RGB and thermal images for crowd counting.
Experiment results on RGBT-CC dataset show that our method achieves more than 20% improvement on mean average error.
- Score: 16.336401175470197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a three-stream adaptive fusion network named
TAFNet, which uses paired RGB and thermal images for crowd counting.
Specifically, TAFNet is divided into one main stream and two auxiliary streams.
We combine a pair of RGB and thermal images to constitute the input of main
stream. Two auxiliary streams respectively exploit RGB image and thermal image
to extract modality-specific features. Besides, we propose an Information
Improvement Module (IIM) to fuse the modality-specific features into the main
stream adaptively. Experiment results on RGBT-CC dataset show that our method
achieves more than 20% improvement on mean average error and root mean squared
error compared with state-of-the-art method. The source code will be publicly
available at https://github.com/TANGHAIHAN/TAFNet.
Related papers
- Complementary Random Masking for RGB-Thermal Semantic Segmentation [63.93784265195356]
RGB-thermal semantic segmentation is a potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions.
This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities.
We achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-30T13:57:21Z) - Explicit Attention-Enhanced Fusion for RGB-Thermal Perception Tasks [13.742299383836256]
We propose a novel fusion method named Explicit Attention-Enhanced Fusion (EAEF) that fully takes advantage of each type of data.
The proposed fusion method outperforms state-of-the-art by 1.6% in mIoU on semantic segmentation, 3.1% in MAE on salient object detection, 2.3% in mAP on object detection, and 8.1% in MAE on crowd counting.
arXiv Detail & Related papers (2023-03-28T03:37:27Z) - Fine-Grained Action Detection with RGB and Pose Information using Two
Stream Convolutional Networks [1.4502611532302039]
We propose a two-stream network approach for the classification and detection of table tennis strokes.
Our method utilizes raw RGB data and pose information computed from MMPose toolbox.
We can report an improvement in stroke classification, reaching 87.3% of accuracy, while the detection does not outperform the baseline but still reaches an IoU of 0.349 and mAP of 0.110.
arXiv Detail & Related papers (2023-02-06T13:05:55Z) - MAFNet: A Multi-Attention Fusion Network for RGB-T Crowd Counting [40.4816930622052]
We propose a two-stream RGB-T crowd counting network called Multi-Attention Fusion Network (MAFNet)
In the encoder part, a Multi-Attention Fusion (MAF) module is embedded into different stages of the two modality-specific branches for cross-modal fusion.
Extensive experiments on two popular datasets show that the proposed MAFNet is effective for RGB-T crowd counting.
arXiv Detail & Related papers (2022-08-14T02:42:09Z) - Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing [4.913013713982677]
We propose an edge-aware guidance fusion network (EGFNet) for RGB thermal scene parsing.
To effectively fuse the RGB and thermal information, we propose a multimodal fusion module.
Considering the importance of high level semantic information, we propose a global information module and a semantic information module.
arXiv Detail & Related papers (2021-12-09T01:12:47Z) - MTFNet: Mutual-Transformer Fusion Network for RGB-D Salient Object
Detection [15.371153771528093]
We propose a novel Mutual-Transformer Fusion Network (MTFNet) for RGB-D SOD.
MTFNet contains two main modules, $i.e.$, Focal Feature Extractor (FFE) and Mutual-Transformer Fusion (MTF)
Comprehensive experimental results on six public benchmarks demonstrate the superiority of our proposed MTFNet.
arXiv Detail & Related papers (2021-12-02T12:48:37Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Self-Supervised Representation Learning for RGB-D Salient Object
Detection [93.17479956795862]
We use Self-Supervised Representation Learning to design two pretext tasks: the cross-modal auto-encoder and the depth-contour estimation.
Our pretext tasks require only a few and un RGB-D datasets to perform pre-training, which make the network capture rich semantic contexts.
For the inherent problem of cross-modal fusion in RGB-D SOD, we propose a multi-path fusion module.
arXiv Detail & Related papers (2021-01-29T09:16:06Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.