Global Aggregation then Local Distribution for Scene Parsing
- URL: http://arxiv.org/abs/2107.13154v1
- Date: Wed, 28 Jul 2021 03:46:57 GMT
- Title: Global Aggregation then Local Distribution for Scene Parsing
- Authors: Xiangtai Li, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong,
Xiatian Zhu, Tao Xiang
- Abstract summary: We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
- Score: 99.1095068574454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modelling long-range contextual relationships is critical for pixel-wise
prediction tasks such as semantic segmentation. However, convolutional neural
networks (CNNs) are inherently limited to model such dependencies due to the
naive structure in its building modules (\eg, local convolution kernel). While
recent global aggregation methods are beneficial for long-range structure
information modelling, they would oversmooth and bring noise to the regions
containing fine details (\eg,~boundaries and small objects), which are very
much cared for the semantic segmentation task. To alleviate this problem, we
propose to explore the local context for making the aggregated long-range
relationship being distributed more accurately in local regions. In particular,
we design a novel local distribution module which models the affinity map
between global and local relationship for each pixel adaptively. Integrating
existing global aggregation modules, we show that our approach can be
modularized as an end-to-end trainable block and easily plugged into existing
semantic segmentation networks, giving rise to the \emph{GALD} networks.
Despite its simplicity and versatility, our approach allows us to build new
state of the art on major semantic segmentation benchmarks including
Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff. Code and trained
models are released at \url{https://github.com/lxtGH/GALD-DGCNet} to foster
further research.
Related papers
- Learning Enriched Features via Selective State Spaces Model for Efficient Image Deblurring [0.0]
Image deblurring aims to restore a high-quality image from its corresponding blurred.
We propose an efficient image deblurring network that leverages selective state spaces model to aggregate enriched and accurate features.
Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches on widely used benchmarks.
arXiv Detail & Related papers (2024-03-29T10:40:41Z) - A Global-Local Approximation Framework for Large-Scale Gaussian Process
Modeling [0.0]
We propose a novel framework for large-scale Gaussian process (GP) modeling.
We employ a combined global-local approach in building the approximation.
The performance of our framework, which we refer to as TwinGP, is on par or better than the state-of-the-art GP modeling methods.
arXiv Detail & Related papers (2023-05-17T12:19:59Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context
Propagation in Transformers [60.51925353387151]
We propose a novel module named Local Context Propagation (LCP) to exploit the message passing between neighboring local regions.
We use the overlap points of adjacent local regions as intermediaries, then re-weight the features of these shared points from different local regions before passing them to the next layers.
The proposed method is applicable to different tasks and outperforms various transformer-based methods in benchmarks including 3D shape classification and dense prediction tasks.
arXiv Detail & Related papers (2022-10-23T15:43:01Z) - Global and Local Features through Gaussian Mixture Models on Image
Semantic Segmentation [0.38073142980732994]
We propose an internal structure for the feature representations while extracting a global representation that supports the former.
During training, we predict a Gaussian Mixture Model from the data, which, merged with the skip connections and the decoding stage, helps avoid wrong inductive biases.
Our results show that we can improve semantic segmentation by providing both learning representations (global and local) with a clustering behavior and combining them.
arXiv Detail & Related papers (2022-07-19T10:10:49Z) - Contextual Attention Network: Transformer Meets U-Net [0.0]
convolutional neural networks (CNN) have become the de facto standard and attained immense success in medical image segmentation.
However, CNN based methods fail to build long-range dependencies and global context connections.
Recent articles have exploited Transformer variants for medical image segmentation tasks.
arXiv Detail & Related papers (2022-03-02T21:10:24Z) - PRA-Net: Point Relation-Aware Network for 3D Point Cloud Analysis [56.91758845045371]
We propose a novel framework named Point Relation-Aware Network (PRA-Net)
It is composed of an Intra-region Structure Learning (ISL) module and an Inter-region Relation Learning (IRL) module.
Experiments on several 3D benchmarks covering shape classification, keypoint estimation, and part segmentation have verified the effectiveness and the ability of PRA-Net.
arXiv Detail & Related papers (2021-12-09T13:24:43Z) - An Entropy-guided Reinforced Partial Convolutional Network for Zero-Shot
Learning [77.72330187258498]
We propose a novel Entropy-guided Reinforced Partial Convolutional Network (ERPCNet)
ERPCNet extracts and aggregates localities based on semantic relevance and visual correlations without human-annotated regions.
It not only discovers global-cooperative localities dynamically but also converges faster for policy gradient optimization.
arXiv Detail & Related papers (2021-11-03T11:13:13Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.