Deep Hierarchical Semantic Segmentation
- URL: http://arxiv.org/abs/2203.14335v2
- Date: Tue, 29 Mar 2022 04:36:05 GMT
- Title: Deep Hierarchical Semantic Segmentation
- Authors: Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, Yi Yang
- Abstract summary: hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
- Score: 76.40565872257709
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans are able to recognize structured relations in observation, allowing us
to decompose complex scenes into simpler parts and abstract the visual world in
multiple levels. However, such hierarchical reasoning ability of human
perception remains largely unexplored in current literature of semantic
segmentation. Existing work is often aware of flatten labels and predicts
target classes exclusively for each pixel. In this paper, we instead address
hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise
description of visual observation in terms of a class hierarchy. We devise
HSSN, a general HSS framework that tackles two critical issues in this task: i)
how to efficiently adapt existing hierarchy-agnostic segmentation networks to
the HSS setting, and ii) how to leverage the hierarchy information to
regularize HSS network learning. To address i), HSSN directly casts HSS as a
pixel-wise multi-label classification task, only bringing minimal architecture
change to current segmentation models. To solve ii), HSSN first explores
inherent properties of the hierarchy as a training objective, which enforces
segmentation predictions to obey the hierarchy structure. Further, with
hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space,
so as to generate well-structured pixel representations and improve
segmentation eventually. We conduct experiments on four semantic segmentation
datasets (i.e., Mapillary Vistas 2.0, Cityscapes, LIP, and PASCAL-Person-Part),
with different class hierarchies, segmentation network architectures and
backbones, showing the generalization and superiority of HSSN.
Related papers
- SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images [17.98848062686217]
We introduce the first hierarchical semantic segmentation dataset with subpart annotations for natural images.
We also introduce two novel evaluation metrics to evaluate how well algorithms capture spatial and semantic relationships across hierarchical levels.
arXiv Detail & Related papers (2024-07-12T21:08:00Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Hierarchical Open-vocabulary Universal Image Segmentation [48.008887320870244]
Open-vocabulary image segmentation aims to partition an image into semantic regions according to arbitrary text descriptions.
We propose a decoupled text-image fusion mechanism and representation learning modules for both "things" and "stuff"
Our resulting model, named HIPIE tackles, HIerarchical, oPen-vocabulary, and unIvErsal segmentation tasks within a unified framework.
arXiv Detail & Related papers (2023-07-03T06:02:15Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - StructToken : Rethinking Semantic Segmentation with Structural Prior [14.056789487558731]
We present a new paradigm for semantic segmentation, named structure-aware extraction.
It generates the segmentation results via the interactions between a set of learned structure tokens and the image feature, which aims to progressively extract the structural information of each category from the feature.
Our StructToken outperforms the state-of-the-art on three widely-used benchmarks, including ADE20K, Cityscapes, and COCO-Stuff-10K.
arXiv Detail & Related papers (2022-03-23T17:58:31Z) - TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic
Segmentation [44.75300205362518]
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.
We propose the first top-down unsupervised semantic segmentation framework for fine-grained segmentation in extremely complicated scenarios.
Our results show that our top-down unsupervised segmentation is robust to both object-centric and scene-centric datasets.
arXiv Detail & Related papers (2021-12-02T18:59:03Z) - HS3: Learning with Proper Task Complexity in Hierarchically Supervised
Semantic Segmentation [81.87943324048756]
We propose Hierarchically Supervised Semantic (HS3), a training scheme that supervises intermediate layers in a segmentation network to learn meaningful representations by varying task complexity.
Our proposed HS3-Fuse framework further improves segmentation predictions and achieves state-of-the-art results on two large segmentation benchmarks: NYUD-v2 and Cityscapes.
arXiv Detail & Related papers (2021-11-03T16:33:29Z) - Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50.
Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z) - HOSE-Net: Higher Order Structure Embedded Network for Scene Graph
Generation [20.148175528691905]
This paper presents a novel structure-aware embedding-to-classifier(SEC) module to incorporate both local and global structural information of relationships into the output space.
We also propose a hierarchical semantic aggregation(HSA) module to reduce the number of subspaces by introducing higher order structural information.
The proposed HOSE-Net achieves the state-of-the-art performance on two popular benchmarks of Visual Genome and VRD.
arXiv Detail & Related papers (2020-08-12T07:58:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.