Robust Vision Challenge 2020 -- 1st Place Report for Panoptic
Segmentation
- URL: http://arxiv.org/abs/2008.10112v1
- Date: Sun, 23 Aug 2020 21:41:43 GMT
- Title: Robust Vision Challenge 2020 -- 1st Place Report for Panoptic
Segmentation
- Authors: Rohit Mohan and Abhinav Valada
- Abstract summary: Our network is a lightweight version of our state-of-the-art EfficientPS architecture.
It consists of our proposed shared backbone with a modified EfficientNet-B5 model as the encoder, followed by the 2-way FPN to learn semantically rich multi-scale features.
Our proposed panoptic fusion module adaptively fuses logits from each of the heads to yield the panoptic segmentation output.
- Score: 13.23676270963484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this technical report, we present key details of our winning panoptic
segmentation architecture EffPS_b1bs4_RVC. Our network is a lightweight version
of our state-of-the-art EfficientPS architecture that consists of our proposed
shared backbone with a modified EfficientNet-B5 model as the encoder, followed
by the 2-way FPN to learn semantically rich multi-scale features. It consists
of two task-specific heads, a modified Mask R-CNN instance head and our novel
semantic segmentation head that processes features of different scales with
specialized modules for coherent feature refinement. Finally, our proposed
panoptic fusion module adaptively fuses logits from each of the heads to yield
the panoptic segmentation output. The Robust Vision Challenge 2020 benchmarking
results show that our model is ranked #1 on Microsoft COCO, VIPER and WildDash,
and is ranked #2 on Cityscapes and Mapillary Vistas, thereby achieving the
overall rank #1 for the panoptic segmentation task.
Related papers
- Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - ClusterFormer: Clustering As A Universal Visual Learner [80.79669078819562]
CLUSTERFORMER is a universal vision model based on the CLUSTERing paradigm with TransFORMER.
It is capable of tackling heterogeneous vision tasks with varying levels of clustering granularity.
For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
arXiv Detail & Related papers (2023-09-22T22:12:30Z) - Towards Universal Vision-language Omni-supervised Segmentation [72.31277932442988]
We present Vision-Language Omni-Supervised (VLOSS) to treat open-world segmentation tasks as proposal classification.
We leverage omni-supervised data (i.e., panoptic segmentation data, object detection data, and image-text pairs data) into training, thus enriching the open-world segmentation ability.
With fewer parameters, our VLOSS with Swin-Tiny surpasses MaskCLIP by 2% in terms of mask AP on LVIS v1 dataset.
arXiv Detail & Related papers (2023-03-12T02:57:53Z) - Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic
Segmentation via Clustering Pseudo Heatmap [9.770808277353128]
We propose a fast and high-performance LiDAR-based framework, referred to as Panoptic-PHNet.
We introduce a clustering pseudo heatmap as a new paradigm, which, followed by a center grouping module, yields instance centers for efficient clustering.
For backbone design, we fuse the fine-grained voxel features and the 2D Bird's Eye View (BEV) features with different receptive fields to utilize both detailed and global information.
arXiv Detail & Related papers (2022-05-14T08:16:13Z) - Amodal Panoptic Segmentation [13.23676270963484]
We formulate and propose a novel task that we name amodal panoptic segmentation.
The goal of this task is to simultaneously predict the pixel-wise semantic segmentation labels of the visible regions of stuff classes.
We propose the novel amodal panoptic segmentation network (APSNet) as a first step towards addressing this task.
arXiv Detail & Related papers (2022-02-23T14:41:59Z) - 7th AI Driving Olympics: 1st Place Report for Panoptic Tracking [6.226227982115869]
Our architecture won the panoptic tracking challenge in the 7th AI Driving Olympics at NeurIPS 2021.
Our approach exploits three consecutive accumulated scans to predict locally consistent panoptic tracking IDs and also the overlap between the scans to predict globally consistent panoptic tracking IDs for a given sequence.
The benchmarking results from the 7th AI Driving Olympics at NeurIPS 2021 show that our model is ranked #1 for the panoptic tracking task on the Panoptic nuScenes dataset.
arXiv Detail & Related papers (2021-12-09T20:52:28Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - SG-Net: Spatial Granularity Network for One-Stage Video Instance
Segmentation [7.544917072241684]
Video instance segmentation (VIS) is a new and critical task in computer vision.
We propose a one-stage spatial granularity network (SG-Net) for VIS.
We show that our method can achieve improved performance in both accuracy and inference speed.
arXiv Detail & Related papers (2021-03-18T14:31:15Z) - Auto-Panoptic: Cooperative Multi-Component Architecture Search for
Panoptic Segmentation [144.50154657257605]
We propose an efficient framework to simultaneously search for all main components including backbone, segmentation branches, and feature fusion module.
Our searched architecture, namely Auto-Panoptic, achieves the new state-of-the-art on the challenging COCO and ADE20K benchmarks.
arXiv Detail & Related papers (2020-10-30T08:34:35Z) - EfficientPS: Efficient Panoptic Segmentation [13.23676270963484]
We introduce the Efficient Panoptic (EfficientPS) architecture that efficiently encodes and fuses semantically rich multi-scale features.
We incorporate a semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head.
We also introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark.
arXiv Detail & Related papers (2020-04-05T20:15:59Z) - 1st Place Solutions for OpenImage2019 -- Object Detection and Instance
Segmentation [116.25081559037872]
This article introduces the solutions of the two champion teams, MMfruit' for the detection track and MMfruitSeg' for the segmentation track, in OpenImage Challenge 2019.
It is commonly known that for an object detector, the shared feature at the end of the backbone is not appropriate for both classification and regression.
We propose the Decoupling Head (DH) to disentangle the object classification and regression via the self-learned optimal feature extraction.
arXiv Detail & Related papers (2020-03-17T06:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.