Diffusion-based Data Augmentation for Object Counting Problems
- URL: http://arxiv.org/abs/2401.13992v1
- Date: Thu, 25 Jan 2024 07:28:22 GMT
- Title: Diffusion-based Data Augmentation for Object Counting Problems
- Authors: Zhen Wang, Yuelei Li, Jia Wan, Nuno Vasconcelos
- Abstract summary: We develop a pipeline that utilizes a diffusion model to generate extensive training data.
We are the first to generate images conditioned on a location dot map with a diffusion model.
Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
- Score: 62.63346162144445
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowd counting is an important problem in computer vision due to its wide
range of applications in image understanding. Currently, this problem is
typically addressed using deep learning approaches, such as Convolutional
Neural Networks (CNNs) and Transformers. However, deep networks are data-driven
and are prone to overfitting, especially when the available labeled crowd
dataset is limited. To overcome this limitation, we have designed a pipeline
that utilizes a diffusion model to generate extensive training data. We are the
first to generate images conditioned on a location dot map (a binary dot map
that specifies the location of human heads) with a diffusion model. We are also
the first to use these diverse synthetic data to augment the crowd counting
models. Our proposed smoothed density map input for ControlNet significantly
improves ControlNet's performance in generating crowds in the correct
locations. Also, Our proposed counting loss for the diffusion model effectively
minimizes the discrepancies between the location dot map and the crowd images
generated. Additionally, our innovative guidance sampling further directs the
diffusion process toward regions where the generated crowd images align most
accurately with the location dot map. Collectively, we have enhanced
ControlNet's ability to generate specified objects from a location dot map,
which can be used for data augmentation in various counting problems. Moreover,
our framework is versatile and can be easily adapted to all kinds of counting
problems. Extensive experiments demonstrate that our framework improves the
counting performance on the ShanghaiTech, NWPU-Crowd, UCF-QNRF, and TRANCOS
datasets, showcasing its effectiveness.
Related papers
- Redesigning Multi-Scale Neural Network for Crowd Counting [68.674652984003]
We introduce a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting.
Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales.
Experiments show that our method achieves the state-of-the-art performance on five public datasets.
arXiv Detail & Related papers (2022-08-04T21:49:29Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - PSCNet: Pyramidal Scale and Global Context Guided Network for Crowd
Counting [44.306790250158954]
This paper proposes a novel crowd counting approach based on pyramidal scale module (PSM) and global context module (GCM)
PSM is used to adaptively capture multi-scale information, which can identify a fine boundary of crowds with different image scales.
GCM is devised with low-complexity and lightweight manner, to make the interactive information across the channels of the feature maps more efficient.
arXiv Detail & Related papers (2020-12-07T11:35:56Z) - Bayesian Multi Scale Neural Network for Crowd Counting [0.0]
We propose a new network which uses a ResNet based feature extractor, downsampling block which uses dilated convolutions and upsampling block using transposed convolutions.
We present a novel aggregation module which makes our network robust to the perspective view problem.
arXiv Detail & Related papers (2020-07-11T21:43:20Z) - Local Grid Rendering Networks for 3D Object Detection in Point Clouds [98.02655863113154]
CNNs are powerful but it would be computationally costly to directly apply convolutions on point data after voxelizing the entire point clouds to a dense regular 3D grid.
We propose a novel and principled Local Grid Rendering (LGR) operation to render the small neighborhood of a subset of input points into a low-resolution 3D grid independently.
We validate LGR-Net for 3D object detection on the challenging ScanNet and SUN RGB-D datasets.
arXiv Detail & Related papers (2020-07-04T13:57:43Z) - JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations.
We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.