Towards holistic scene understanding: Semantic segmentation and beyond
- URL: http://arxiv.org/abs/2201.07734v1
- Date: Sun, 16 Jan 2022 19:18:11 GMT
- Title: Towards holistic scene understanding: Semantic segmentation and beyond
- Authors: Panagiotis Meletis
- Abstract summary: This dissertation addresses visual scene understanding and enhances segmentation performance and generalization, training efficiency of networks, and holistic understanding.
First, we investigate semantic segmentation in the context of street scenes and train semantic segmentation networks on combinations of various datasets.
In Chapter 2 we design a framework of hierarchical classifiers over a single convolutional backbone, and train it end-to-end on a combination of pixel-labeled datasets.
In Chapter 3 we propose a weakly-supervised algorithm for training with bounding box-level and image-level supervision instead of only with per-pixel supervision.
- Score: 2.7920304852537536
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This dissertation addresses visual scene understanding and enhances
segmentation performance and generalization, training efficiency of networks,
and holistic understanding. First, we investigate semantic segmentation in the
context of street scenes and train semantic segmentation networks on
combinations of various datasets. In Chapter 2 we design a framework of
hierarchical classifiers over a single convolutional backbone, and train it
end-to-end on a combination of pixel-labeled datasets, improving
generalizability and the number of recognizable semantic concepts. Chapter 3
focuses on enriching semantic segmentation with weak supervision and proposes a
weakly-supervised algorithm for training with bounding box-level and
image-level supervision instead of only with per-pixel supervision. The memory
and computational load challenges that arise from simultaneous training on
multiple datasets are addressed in Chapter 4. We propose two methodologies for
selecting informative and diverse samples from datasets with weak supervision
to reduce our networks' ecological footprint without sacrificing performance.
Motivated by memory and computation efficiency requirements, in Chapter 5, we
rethink simultaneous training on heterogeneous datasets and propose a universal
semantic segmentation framework. This framework achieves consistent increases
in performance metrics and semantic knowledgeability by exploiting various
scene understanding datasets. Chapter 6 introduces the novel task of part-aware
panoptic segmentation, which extends our reasoning towards holistic scene
understanding. This task combines scene and parts-level semantics with
instance-level object detection. In conclusion, our contributions span over
convolutional network architectures, weakly-supervised learning, part and
panoptic segmentation, paving the way towards a holistic, rich, and sustainable
visual scene understanding.
Related papers
- A Lightweight Clustering Framework for Unsupervised Semantic
Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data.
We propose a lightweight clustering framework for unsupervised semantic segmentation.
Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z) - ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention [7.783971241874388]
This paper presents ContextSeg - a simple yet highly effective approach to tackling this problem with two stages.
In the first stage, to better encode the shape and positional information of strokes, we propose to predict an extra dense distance field in an autoencoder network.
In the second stage, we treat an entire stroke as a single entity and label a group of strokes within the same semantic part using an auto-regressive Transformer with the default attention mechanism.
arXiv Detail & Related papers (2023-11-28T10:53:55Z) - AIMS: All-Inclusive Multi-Level Segmentation [93.5041381700744]
We propose a new task, All-Inclusive Multi-Level (AIMS), which segments visual regions into three levels: part, entity, and relation.
We also build a unified AIMS model through multi-dataset multi-task training to address the two major challenges of annotation inconsistency and task correlation.
arXiv Detail & Related papers (2023-05-28T16:28:49Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - HS3: Learning with Proper Task Complexity in Hierarchically Supervised
Semantic Segmentation [81.87943324048756]
We propose Hierarchically Supervised Semantic (HS3), a training scheme that supervises intermediate layers in a segmentation network to learn meaningful representations by varying task complexity.
Our proposed HS3-Fuse framework further improves segmentation predictions and achieves state-of-the-art results on two large segmentation benchmarks: NYUD-v2 and Cityscapes.
arXiv Detail & Related papers (2021-11-03T16:33:29Z) - Unsupervised Image Segmentation by Mutual Information Maximization and
Adversarial Regularization [7.165364364478119]
We propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adrial Regularization (InMARS)
Inspired by human perception which parses a scene into perceptual groups, our proposed approach first partitions an input image into meaningful regions (also known as superpixels)
Next, it utilizes Mutual-Information-Maximization followed by an adversarial training strategy to cluster these regions into semantically meaningful classes.
Our experiments demonstrate that our method achieves the state-of-the-art performance on two commonly used unsupervised semantic segmentation datasets.
arXiv Detail & Related papers (2021-07-01T18:36:27Z) - Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.