HCNet: Hierarchical Context Network for Semantic Segmentation
- URL: http://arxiv.org/abs/2010.04962v2
- Date: Tue, 20 Oct 2020 03:06:36 GMT
- Title: HCNet: Hierarchical Context Network for Semantic Segmentation
- Authors: Yanwen Chong, Congchong Nie, Yulong Tao, Xiaoshu Chen, Shaoming Pan
- Abstract summary: We propose a hierarchical context network to model homogeneous pixels with strong correlations and heterogeneous pixels with weak correlations.
Our approach realizes a mean IoU of 82.8% and overall accuracy of 91.4% on Cityscapes and ISPRS Vaihingen dataset.
- Score: 6.4047628200011815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Global context information is vital in visual understanding problems,
especially in pixel-level semantic segmentation. The mainstream methods adopt
the self-attention mechanism to model global context information. However,
pixels belonging to different classes usually have weak feature correlation.
Modeling the global pixel-level correlation matrix indiscriminately is
extremely redundant in the self-attention mechanism. In order to solve the
above problem, we propose a hierarchical context network to differentially
model homogeneous pixels with strong correlations and heterogeneous pixels with
weak correlations. Specifically, we first propose a multi-scale guided
pre-segmentation module to divide the entire feature map into different
classed-based homogeneous regions. Within each homogeneous region, we design
the pixel context module to capture pixel-level correlations. Subsequently,
different from the self-attention mechanism that still models weak
heterogeneous correlations in a dense pixel-level manner, the region context
module is proposed to model sparse region-level dependencies using a unified
representation of each region. Through aggregating fine-grained pixel context
features and coarse-grained region context features, our proposed network can
not only hierarchically model global context information but also harvest
multi-granularity representations to more robustly identify multi-scale
objects. We evaluate our approach on Cityscapes and the ISPRS Vaihingen
dataset. Without Bells or Whistles, our approach realizes a mean IoU of 82.8%
and overall accuracy of 91.4% on Cityscapes and ISPRS Vaihingen test set,
achieving state-of-the-art results.
Related papers
- Learning Invariant Inter-pixel Correlations for Superpixel Generation [12.605604620139497]
Learnable features exhibit constrained discriminative capability, resulting in unsatisfactory pixel grouping performance.
We propose the Content Disentangle Superpixel algorithm to selectively separate the invariant inter-pixel correlations and statistical properties.
The experimental results on four benchmark datasets demonstrate the superiority of our approach to existing state-of-the-art methods.
arXiv Detail & Related papers (2024-02-28T09:46:56Z) - Keypoint-Augmented Self-Supervised Learning for Medical Image
Segmentation with Limited Annotation [21.203307064937142]
We present a keypointaugmented fusion layer that extracts representations preserving both short- and long-range self-attention.
In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial selfattention.
Our method further outperforms existing SSL methods by producing more robust self-attention.
arXiv Detail & Related papers (2023-10-02T22:31:30Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [63.54342601757723]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - A Unified Architecture of Semantic Segmentation and Hierarchical
Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs.
A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model.
We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - An attention-driven hierarchical multi-scale representation for visual
recognition [3.3302293148249125]
Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content.
We propose a method to capture high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs)
Our approach is simple yet extremely effective in solving both the fine-grained and generic visual classification problems.
arXiv Detail & Related papers (2021-10-23T09:22:22Z) - Global Aggregation then Local Distribution for Scene Parsing [99.1095068574454]
We show that our approach can be modularized as an end-to-end trainable block and easily plugged into existing semantic segmentation networks.
Our approach allows us to build new state of the art on major semantic segmentation benchmarks including Cityscapes, ADE20K, Pascal Context, Camvid and COCO-stuff.
arXiv Detail & Related papers (2021-07-28T03:46:57Z) - Low Light Image Enhancement via Global and Local Context Modeling [164.85287246243956]
We introduce a context-aware deep network for low-light image enhancement.
First, it features a global context module that models spatial correlations to find complementary cues over full spatial domain.
Second, it introduces a dense residual block that captures local context with a relatively large receptive field.
arXiv Detail & Related papers (2021-01-04T09:40:54Z) - Superpixel Segmentation Based on Spatially Constrained Subspace
Clustering [57.76302397774641]
We consider each representative region with independent semantic information as a subspace, and formulate superpixel segmentation as a subspace clustering problem.
We show that a simple integration of superpixel segmentation with the conventional subspace clustering does not effectively work due to the spatial correlation of the pixels.
We propose a novel convex locality-constrained subspace clustering model that is able to constrain the spatial adjacent pixels with similar attributes to be clustered into a superpixel.
arXiv Detail & Related papers (2020-12-11T06:18:36Z) - A U-Net Based Discriminator for Generative Adversarial Networks [86.67102929147592]
We propose an alternative U-Net based discriminator architecture for generative adversarial networks (GANs)
The proposed architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images.
The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics.
arXiv Detail & Related papers (2020-02-28T11:16:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.