Capturing Omni-Range Context for Omnidirectional Segmentation
- URL: http://arxiv.org/abs/2103.05687v1
- Date: Tue, 9 Mar 2021 19:46:09 GMT
- Title: Capturing Omni-Range Context for Omnidirectional Segmentation
- Authors: Kailun Yang, Jiaming Zhang, Simon Rei{\ss}, Xinxin Hu, Rainer
Stiefelhagen
- Abstract summary: We introduce Concurrent Attention Networks (ECANets) to bridge the gap in terms of FoV and structural distribution between the imaging domains.
We upgrade model training by leveraging multi-source and omni-supervised learning, taking advantage of both: Densely labeled and unlabeled data.
Our novel model, training regimen and multisource prediction fusion elevate the performance (mIoU) to new state-of-the-art results.
- Score: 29.738065412097598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Networks (ConvNets) excel at semantic segmentation and have
become a vital component for perception in autonomous driving. Enabling an
all-encompassing view of street-scenes, omnidirectional cameras present
themselves as a perfect fit in such systems. Most segmentation models for
parsing urban environments operate on common, narrow Field of View (FoV)
images. Transferring these models from the domain they were designed for to
360-degree perception, their performance drops dramatically, e.g., by an
absolute 30.0% (mIoU) on established test-beds. To bridge the gap in terms of
FoV and structural distribution between the imaging domains, we introduce
Efficient Concurrent Attention Networks (ECANets), directly capturing the
inherent long-range dependencies in omnidirectional imagery. In addition to the
learned attention-based contextual priors that can stretch across 360-degree
images, we upgrade model training by leveraging multi-source and
omni-supervised learning, taking advantage of both: Densely labeled and
unlabeled data originating from multiple datasets. To foster progress in
panoramic image segmentation, we put forward and extensively evaluate models on
Wild PAnoramic Semantic Segmentation (WildPASS), a dataset designed to capture
diverse scenes from all around the globe. Our novel model, training regimen and
multi-source prediction fusion elevate the performance (mIoU) to new
state-of-the-art results on the public PASS (60.2%) and the fresh WildPASS
(69.0%) benchmarks.
Related papers
- Open Panoramic Segmentation [34.46596562350091]
We propose a new task called Open Panoramic (OPS), where models are trained with FoV-restricted pinhole images in an open-vocabulary setting.
We also propose a model named OOOPS with a Deformable Adapter Network (DAN), which significantly improves panoramic semantic segmentation performance.
Surpassing other state-of-the-art open-vocabulary semantic segmentation approaches, a remarkable performance boost on three panoramic datasets.
arXiv Detail & Related papers (2024-07-02T22:00:32Z) - 360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos [16.372814014632944]
We propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS)
360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories.
We benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset.
arXiv Detail & Related papers (2024-04-22T07:54:53Z) - Few-Shot Panoptic Segmentation With Foundation Models [23.231014713335664]
We propose to leverage task-agnostic image features to enable few-shot panoptic segmentation by presenting Segmenting Panoptic Information with Nearly 0 labels (SPINO)
In detail, our method combines a DINOv2 backbone with lightweight network heads for semantic segmentation and boundary estimation.
We show that our approach, albeit being trained with only ten annotated images, predicts high-quality pseudo-labels that can be used with any existing panoptic segmentation method.
arXiv Detail & Related papers (2023-09-19T16:09:01Z) - MRGAN360: Multi-stage Recurrent Generative Adversarial Network for 360
Degree Image Saliency Prediction [10.541086214760497]
We propose a novel multi-stage recurrent generative adversarial networks for ODIs dubbed MRGAN360.
At each stage, the prediction model takes as input the original image and the output of the previous stage and outputs a more accurate saliency map.
We employ a recurrent neural network among adjacent prediction stages to model their correlations, and exploit a discriminator at the end of each stage to supervise the output saliency map.
arXiv Detail & Related papers (2023-03-15T11:15:03Z) - Rethinking Range View Representation for LiDAR Segmentation [66.73116059734788]
"Many-to-one" mapping, semantic incoherence, and shape deformation are possible impediments against effective learning from range view projections.
We present RangeFormer, a full-cycle framework comprising novel designs across network architecture, data augmentation, and post-processing.
We show that, for the first time, a range view method is able to surpass the point, voxel, and multi-view fusion counterparts in the competing LiDAR semantic and panoptic segmentation benchmarks.
arXiv Detail & Related papers (2023-03-09T16:13:27Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets.
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets.
In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.