Encoder-Decoder Based Convolutional Neural Networks with
Multi-Scale-Aware Modules for Crowd Counting
- URL: http://arxiv.org/abs/2003.05586v5
- Date: Wed, 25 Nov 2020 12:35:21 GMT
- Title: Encoder-Decoder Based Convolutional Neural Networks with
Multi-Scale-Aware Modules for Crowd Counting
- Authors: Pongpisit Thanasutives, Ken-ichi Fukui, Masayuki Numao, Boonserm
Kijsirikul
- Abstract summary: We propose two modified neural networks for accurate and efficient crowd counting.
The first model is named M-SFANet, which is attached with atrous spatial pyramid pooling (ASPP) and context-aware module (CAN)
The second model is called M-SegNet, which is produced by replacing the bilinear upsampling in SFANet with max unpooling that is used in SegNet.
- Score: 6.893512627479196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose two modified neural networks based on dual path
multi-scale fusion networks (SFANet) and SegNet for accurate and efficient
crowd counting. Inspired by SFANet, the first model, which is named M-SFANet,
is attached with atrous spatial pyramid pooling (ASPP) and context-aware module
(CAN). The encoder of M-SFANet is enhanced with ASPP containing parallel atrous
convolutional layers with different sampling rates and hence able to extract
multi-scale features of the target object and incorporate larger context. To
further deal with scale variation throughout an input image, we leverage the
CAN module which adaptively encodes the scales of the contextual information.
The combination yields an effective model for counting in both dense and sparse
crowd scenes. Based on the SFANet decoder structure, M-SFANet's decoder has
dual paths, for density map and attention map generation. The second model is
called M-SegNet, which is produced by replacing the bilinear upsampling in
SFANet with max unpooling that is used in SegNet. This change provides a faster
model while providing competitive counting performance. Designed for high-speed
surveillance applications, M-SegNet has no additional multi-scale-aware module
in order to not increase the complexity. Both models are encoder-decoder based
architectures and are end-to-end trainable. We conduct extensive experiments on
five crowd counting datasets and one vehicle counting dataset to show that
these modifications yield algorithms that could improve state-of-the-art crowd
counting methods. Codes are available at
https://github.com/Pongpisit-Thanasutives/Variations-of-SFANet-for-Crowd-Counting.
Related papers
- CSFNet: A Cosine Similarity Fusion Network for Real-Time RGB-X Semantic Segmentation of Driving Scenes [0.0]
multimodal semantic segmentation methods suffer from high computational complexity and low inference speed.
We propose the Cosine Similarity Fusion Network (CSFNet) as a real-time RGB-X semantic segmentation model.
CSFNet has competitive accuracy with state-of-the-art methods while being state-of-the-art in terms of speed.
arXiv Detail & Related papers (2024-07-01T14:34:32Z) - P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation [8.46409964236009]
Diffusion models and multi-scale features are essential components in semantic segmentation tasks.
We propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches.
Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets.
arXiv Detail & Related papers (2024-05-30T19:40:08Z) - Efficient Transformer Encoders for Mask2Former-style models [57.54752243522298]
ECO-M2F is a strategy to self-select the number of hidden layers in the encoder conditioned on the input image.
The proposed approach reduces expected encoder computational cost while maintaining performance.
It is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.
arXiv Detail & Related papers (2024-04-23T17:26:34Z) - TimeMAE: Self-Supervised Representations of Time Series with Decoupled
Masked Autoencoders [55.00904795497786]
We propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks.
The TimeMAE learns enriched contextual representations of time series with a bidirectional encoding scheme.
To solve the discrepancy issue incurred by newly injected masked embeddings, we design a decoupled autoencoder architecture.
arXiv Detail & Related papers (2023-03-01T08:33:16Z) - LENet: Lightweight And Efficient LiDAR Semantic Segmentation Using
Multi-Scale Convolution Attention [0.0]
We propose a projection-based semantic segmentation network called LENet with an encoder-decoder structure for LiDAR-based semantic segmentation.
The encoder is composed of a novel multi-scale convolutional attention (MSCA) module with varying receptive field sizes to capture features.
We show that our proposed method is lighter, more efficient, and robust compared to state-of-the-art semantic segmentation methods.
arXiv Detail & Related papers (2023-01-11T02:51:38Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - LegoNN: Building Modular Encoder-Decoder Models [117.47858131603112]
State-of-the-art encoder-decoder models are constructed and trained end-to-end as an atomic unit.
No component of the model can be (re-)used without the others, making it impossible to share parts.
We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for fine-tuning.
arXiv Detail & Related papers (2022-06-07T14:08:07Z) - MACU-Net for Semantic Segmentation of Fine-Resolution Remotely Sensed
Images [11.047174552053626]
MACU-Net is a multi-scale skip connected and asymmetric-convolution-based U-Net for fine-resolution remotely sensed images.
Our design has the following advantages: (1) The multi-scale skip connections combine and realign semantic features contained in both low-level and high-level feature maps; (2) the asymmetric convolution block strengthens the feature representation and feature extraction capability of a standard convolution layer.
Experiments conducted on two remotely sensed datasets demonstrate that the proposed MACU-Net transcends the U-Net, U-NetPPL, U-Net 3+, amongst other benchmark approaches.
arXiv Detail & Related papers (2020-07-26T08:56:47Z) - Suppress and Balance: A Simple Gated Network for Salient Object
Detection [89.88222217065858]
We propose a simple gated network (GateNet) to solve both issues at once.
With the help of multilevel gate units, the valuable context information from the encoder can be optimally transmitted to the decoder.
In addition, we adopt the atrous spatial pyramid pooling based on the proposed "Fold" operation (Fold-ASPP) to accurately localize salient objects of various scales.
arXiv Detail & Related papers (2020-07-16T02:00:53Z) - A New Multiple Max-pooling Integration Module and Cross Multiscale
Deconvolution Network Based on Image Semantic Segmentation [7.427799203626843]
We propose a new deep convolutional network model for medical image pixel segmentation, called MC-Net.
In the network structure of the encoder, we use multiscale convolution instead of the traditional single-channel convolution.
arXiv Detail & Related papers (2020-03-25T04:27:01Z) - NAS-Count: Counting-by-Density with Neural Architecture Search [74.92941571724525]
We automate the design of counting models with Neural Architecture Search (NAS)
We introduce an end-to-end searched encoder-decoder architecture, Automatic Multi-Scale Network (AMSNet)
arXiv Detail & Related papers (2020-02-29T09:18:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.