Multimodal Crowd Counting with Pix2Pix GANs
- URL: http://arxiv.org/abs/2401.07591v1
- Date: Mon, 15 Jan 2024 10:54:35 GMT
- Title: Multimodal Crowd Counting with Pix2Pix GANs
- Authors: Muhammad Asif Khan, Hamid Menouar, Ridha Hamila
- Abstract summary: We propose the use of generative adversarial networks (GANs) to automatically generate thermal infrared (TIR) images from color (RGB) images.
Our experiments on several state-of-the-art crowd counting models and benchmark crowd datasets report significant improvement in accuracy.
- Score: 2.462045767312954
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most state-of-the-art crowd counting methods use color (RGB) images to learn
the density map of the crowd. However, these methods often struggle to achieve
higher accuracy in densely crowded scenes with poor illumination. Recently,
some studies have reported improvement in the accuracy of crowd counting models
using a combination of RGB and thermal images. Although multimodal data can
lead to better predictions, multimodal data might not be always available
beforehand. In this paper, we propose the use of generative adversarial
networks (GANs) to automatically generate thermal infrared (TIR) images from
color (RGB) images and use both to train crowd counting models to achieve
higher accuracy. We use a Pix2Pix GAN network first to translate RGB images to
TIR images. Our experiments on several state-of-the-art crowd counting models
and benchmark crowd datasets report significant improvement in accuracy.
Related papers
- Simple Image Signal Processing using Global Context Guidance [56.41827271721955]
Deep learning-based ISPs aim to transform RAW images into DSLR-like RGB images using deep neural networks.
We propose a novel module that can be integrated into any neural ISP to capture the global context information from the full RAW images.
Our model achieves state-of-the-art results on different benchmarks using diverse and real smartphone images.
arXiv Detail & Related papers (2024-04-17T17:11:47Z) - Training Neural Networks on RAW and HDR Images for Restoration Tasks [59.41340420564656]
In this work, we test approaches on three popular image restoration applications: denoising, deblurring, and single-image super-resolution.
Our results indicate that neural networks train significantly better on HDR and RAW images represented in display color spaces.
This small change to the training strategy can bring a very substantial gain in performance, up to 10-15 dB.
arXiv Detail & Related papers (2023-12-06T17:47:16Z) - Visible to Thermal image Translation for improving visual task in low
light conditions [0.0]
We have collected images from two different locations using the Parrot Anafi Thermal drone.
We created a two-stream network, preprocessed, augmented, the image data, and trained the generator and discriminator models from scratch.
The findings demonstrate that it is feasible to translate RGB training data to thermal data using GAN.
arXiv Detail & Related papers (2023-10-31T05:18:53Z) - Crowd Counting in Harsh Weather using Image Denoising with Pix2Pix GANs [2.462045767312954]
Visual crowd counting estimates the density of the crowd using deep learning models such as convolution neural networks (CNNs)
In this paper, we propose the use of Pix2Pix generative adversarial network (GAN) to first denoise the crowd images prior to passing them to the counting model.
A Pix2Pix network is trained using synthetic noisy images generated from original crowd images and then the pretrained generator is then used in the inference engine to estimate the crowd density in unseen, noisy crowd images.
arXiv Detail & Related papers (2023-10-11T07:22:37Z) - Edge-guided Multi-domain RGB-to-TIR image Translation for Training
Vision Tasks with Challenging Labels [12.701191873813583]
The insufficient number of annotated thermal infrared (TIR) image datasets hinders TIR image-based deep learning networks to have comparable performances to that of RGB.
We propose a modified multidomain RGB to TIR image translation model focused on edge preservation to employ annotated RGB images with challenging labels.
We have enabled the supervised learning of deep TIR image-based optical flow estimation and object detection that ameliorated in end point error by 56.5% on average and the best object detection mAP of 23.9% respectively.
arXiv Detail & Related papers (2023-01-30T06:44:38Z) - MAFNet: A Multi-Attention Fusion Network for RGB-T Crowd Counting [40.4816930622052]
We propose a two-stream RGB-T crowd counting network called Multi-Attention Fusion Network (MAFNet)
In the encoder part, a Multi-Attention Fusion (MAF) module is embedded into different stages of the two modality-specific branches for cross-modal fusion.
Extensive experiments on two popular datasets show that the proposed MAFNet is effective for RGB-T crowd counting.
arXiv Detail & Related papers (2022-08-14T02:42:09Z) - RGB-Multispectral Matching: Dataset, Learning Methodology, Evaluation [49.28588927121722]
We address the problem of registering synchronized color (RGB) and multi-spectral (MS) images featuring very different resolution by solving stereo matching correspondences.
We introduce a novel RGB-MS dataset framing 13 different scenes in indoor environments and providing a total of 34 image pairs annotated with semi-dense, high-resolution ground-truth labels.
To tackle the task, we propose a deep learning architecture trained in a self-supervised manner by exploiting a further RGB camera.
arXiv Detail & Related papers (2022-06-14T17:59:59Z) - Transform your Smartphone into a DSLR Camera: Learning the ISP in the
Wild [159.71025525493354]
We propose a trainable Image Signal Processing framework that produces DSLR quality images given RAW images captured by a smartphone.
To address the color misalignments between training image pairs, we employ a color-conditional ISP network and optimize a novel parametric color mapping between each input RAW and reference DSLR image.
arXiv Detail & Related papers (2022-03-20T20:13:59Z) - Self-Supervised Modality-Aware Multiple Granularity Pre-Training for
RGB-Infrared Person Re-Identification [9.624510941236837]
Modality-Aware Multiple Granularity Learning (MMGL) is a self-supervised pre-training alternative to ImageNet pre-training.
MMGL learns better representations (+6.47% Rank-1) with faster training speed (converge in few hours) and solider data efficiency (5% data size) than ImageNet pre-training.
Results suggest it generalizes well to various existing models, losses and has promising transferability across datasets.
arXiv Detail & Related papers (2021-12-12T04:40:33Z) - Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision [76.41657124981549]
This paper presents a joint learning model for image alignment and RAW-to-sRGB mapping.
Experiments show that our method performs favorably against state-of-the-arts on ZRR and SR-RAW datasets.
arXiv Detail & Related papers (2021-08-18T12:41:36Z) - Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT
Benchmark for Crowd Counting [109.32927895352685]
We introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people.
To facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework.
Experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.
arXiv Detail & Related papers (2020-12-08T16:18:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.