EfficientPS: Efficient Panoptic Segmentation
- URL: http://arxiv.org/abs/2004.02307v3
- Date: Mon, 1 Feb 2021 09:33:18 GMT
- Title: EfficientPS: Efficient Panoptic Segmentation
- Authors: Rohit Mohan, Abhinav Valada
- Abstract summary: We introduce the Efficient Panoptic (EfficientPS) architecture that efficiently encodes and fuses semantically rich multi-scale features.
We incorporate a semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head.
We also introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark.
- Score: 13.23676270963484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding the scene in which an autonomous robot operates is critical for
its competent functioning. Such scene comprehension necessitates recognizing
instances of traffic participants along with general scene semantics which can
be effectively addressed by the panoptic segmentation task. In this paper, we
introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that
consists of a shared backbone which efficiently encodes and fuses semantically
rich multi-scale features. We incorporate a new semantic head that aggregates
fine and contextual features coherently and a new variant of Mask R-CNN as the
instance head. We also propose a novel panoptic fusion module that congruously
integrates the output logits from both the heads of our EfficientPS
architecture to yield the final panoptic segmentation output. Additionally, we
introduce the KITTI panoptic segmentation dataset that contains panoptic
annotations for the popularly challenging KITTI benchmark. Extensive
evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset
demonstrate that our proposed architecture consistently sets the new
state-of-the-art on all these four benchmarks while being the most efficient
and fast panoptic segmentation architecture to date.
Related papers
- PEM: Prototype-based Efficient MaskFormer for Image Segmentation [10.795762739721294]
Recent transformer-based architectures have shown impressive results in the field of image segmentation.
We propose Prototype-based Efficient MaskFormer (PEM), an efficient transformer-based architecture that can operate in multiple segmentation tasks.
arXiv Detail & Related papers (2024-02-29T18:21:54Z) - Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation [93.25977558780896]
We study the panoptic network design and propose a novel architecture (EDAPS) designed explicitly for domain-adaptive panoptic segmentation.
EDAPS significantly improves the state-of-the-art performance for panoptic segmentation UDA by a large margin of 20% on SYNTHIA-to-Cityscapes and even 72% on the more challenging SYNTHIA-to-Mapillary Vistas.
arXiv Detail & Related papers (2023-04-27T15:51:19Z) - Segment Everything Everywhere All at Once [124.90835636901096]
We present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image.
We propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks.
We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks.
arXiv Detail & Related papers (2023-04-13T17:59:40Z) - Towards Universal Vision-language Omni-supervised Segmentation [72.31277932442988]
We present Vision-Language Omni-Supervised (VLOSS) to treat open-world segmentation tasks as proposal classification.
We leverage omni-supervised data (i.e., panoptic segmentation data, object detection data, and image-text pairs data) into training, thus enriching the open-world segmentation ability.
With fewer parameters, our VLOSS with Swin-Tiny surpasses MaskCLIP by 2% in terms of mask AP on LVIS v1 dataset.
arXiv Detail & Related papers (2023-03-12T02:57:53Z) - Amodal Panoptic Segmentation [13.23676270963484]
We formulate and propose a novel task that we name amodal panoptic segmentation.
The goal of this task is to simultaneously predict the pixel-wise semantic segmentation labels of the visible regions of stuff classes.
We propose the novel amodal panoptic segmentation network (APSNet) as a first step towards addressing this task.
arXiv Detail & Related papers (2022-02-23T14:41:59Z) - Towards holistic scene understanding: Semantic segmentation and beyond [2.7920304852537536]
This dissertation addresses visual scene understanding and enhances segmentation performance and generalization, training efficiency of networks, and holistic understanding.
First, we investigate semantic segmentation in the context of street scenes and train semantic segmentation networks on combinations of various datasets.
In Chapter 2 we design a framework of hierarchical classifiers over a single convolutional backbone, and train it end-to-end on a combination of pixel-labeled datasets.
In Chapter 3 we propose a weakly-supervised algorithm for training with bounding box-level and image-level supervision instead of only with per-pixel supervision.
arXiv Detail & Related papers (2022-01-16T19:18:11Z) - Exemplar-Based Open-Set Panoptic Segmentation Network [79.99748041746592]
We extend panoptic segmentation to the open-world and introduce an open-set panoptic segmentation (OPS) task.
We investigate the practical challenges of the task and construct a benchmark on top of an existing dataset, COCO.
We propose a novel exemplar-based open-set panoptic segmentation network (EOPSN) inspired by exemplar theory.
arXiv Detail & Related papers (2021-05-18T07:59:21Z) - Auto-Panoptic: Cooperative Multi-Component Architecture Search for
Panoptic Segmentation [144.50154657257605]
We propose an efficient framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module.
Our searched architecture, namely Auto-Panoptic, achieves the new state-of-the-art on the challenging COCO and ADE20K benchmarks.
arXiv Detail & Related papers (2020-10-30T08:34:35Z) - Robust Vision Challenge 2020 -- 1st Place Report for Panoptic
Segmentation [13.23676270963484]
Our network is a lightweight version of our state-of-the-art EfficientPS architecture.
It consists of our proposed shared backbone with a modified EfficientNet-B5 model as the encoder, followed by the 2-way FPN to learn semantically rich multi-scale features.
Our proposed panoptic fusion module adaptively fuses logits from each of the heads to yield the panoptic segmentation output.
arXiv Detail & Related papers (2020-08-23T21:41:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.