FishDreamer: Towards Fisheye Semantic Completion via Unified Image
Outpainting and Segmentation
- URL: http://arxiv.org/abs/2303.13842v2
- Date: Thu, 20 Apr 2023 12:27:36 GMT
- Title: FishDreamer: Towards Fisheye Semantic Completion via Unified Image
Outpainting and Segmentation
- Authors: Hao Shi, Yu Li, Kailun Yang, Jiaming Zhang, Kunyu Peng, Alina
Roitberg, Yaozu Ye, Huajian Ni, Kaiwei Wang, Rainer Stiefelhagen
- Abstract summary: This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV)
We introduce the new FishDreamer which relies on successful ViTs enhanced with a novel Polar-aware Cross Attention module (PCA) to leverage dense context and guide semantically-consistent content generation.
- Score: 33.71849096992972
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper raises the new task of Fisheye Semantic Completion (FSC), where
dense texture, structure, and semantics of a fisheye image are inferred even
beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than
ordinary pinhole cameras, yet its unique special imaging model naturally leads
to a blind area at the edge of the image plane. This is suboptimal for
safety-critical applications since important perception tasks, such as semantic
segmentation, become very challenging within the blind zone. Previous works
considered the out-FoV outpainting and in-FoV segmentation separately. However,
we observe that these two tasks are actually closely coupled. To jointly
estimate the tightly intertwined complete fisheye image and scene semantics, we
introduce the new FishDreamer which relies on successful ViTs enhanced with a
novel Polar-aware Cross Attention module (PCA) to leverage dense context and
guide semantically-consistent content generation while considering different
polar distributions. In addition to the contribution of the novel task and
architecture, we also derive Cityscapes-BF and KITTI360-BF datasets to
facilitate training and evaluation of this new track. Our experiments
demonstrate that the proposed FishDreamer outperforms methods solving each task
in isolation and surpasses alternative approaches on the Fisheye Semantic
Completion. Code and datasets are publicly available at
https://github.com/MasterHow/FishDreamer.
Related papers
- SimFIR: A Simple Framework for Fisheye Image Rectification with
Self-supervised Representation Learning [105.01294305972037]
We introduce SimFIR, a framework for fisheye image rectification based on self-supervised representation learning.
To learn fine-grained distortion representations, we first split a fisheye image into multiple patches and extract their representations with a Vision Transformer.
The transfer performance on the downstream rectification task is remarkably boosted, which verifies the effectiveness of the learned representations.
arXiv Detail & Related papers (2023-08-17T15:20:17Z) - FisheyeEX: Polar Outpainting for Extending the FoV of Fisheye Lens [84.12722334460022]
Fisheye lens gains increasing applications in computational photography and assisted driving because of its wide field of view (FoV)
In this paper, we present a FisheyeEX method that extends the FoV of the fisheye lens by outpainting the invalid regions.
The results demonstrate that our approach significantly outperforms the state-of-the-art methods, gaining around 27% more content beyond the original fisheye image.
arXiv Detail & Related papers (2022-06-12T21:38:50Z) - SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware
Inpainting [54.419266357283966]
Single image 3D photography enables viewers to view a still image from novel viewpoints.
Recent approaches combine monocular depth networks with inpainting networks to achieve compelling results.
We present SLIDE, a modular and unified system for single image 3D photography.
arXiv Detail & Related papers (2021-09-02T16:37:20Z) - Memory-Augmented Reinforcement Learning for Image-Goal Navigation [67.3963444878746]
We present a novel method that leverages a cross-episode memory to learn to navigate.
In order to avoid overfitting, we propose to use data augmentation on the RGB input during training.
We obtain this competitive performance from RGB input only, without access to additional sensors such as position or depth.
arXiv Detail & Related papers (2021-01-13T16:30:20Z) - SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving [37.50089104051591]
State-of-the-art self-supervised learning approaches for monocular depth estimation usually suffer from scale ambiguity.
This paper introduces a novel multi-task learning strategy to improve self-supervised monocular distance estimation on fisheye and pinhole camera images.
arXiv Detail & Related papers (2020-08-10T10:52:47Z) - BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and
Semantic Point Cloud [21.29622194272066]
We focus on bird's eye semantic segmentation, a task that predicts pixel-wise semantic segmentation in BEV from side RGB images.
There are two main challenges to this task: the view transformation from side view to bird's eye view, as well as transfer learning to unseen domains.
Our novel 2-staged perception pipeline explicitly predicts pixel depths and combines them with pixel semantics in an efficient manner.
arXiv Detail & Related papers (2020-06-19T23:30:11Z) - Example-Guided Image Synthesis across Arbitrary Scenes using Masked
Spatial-Channel Attention and Self-Supervision [83.33283892171562]
Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image.
In this paper, we tackle a more challenging and general task, where the exemplar is an arbitrary scene image that is semantically different from the given label map.
We propose an end-to-end network for joint global and local feature alignment and synthesis.
arXiv Detail & Related papers (2020-04-18T18:17:40Z) - SilhoNet-Fisheye: Adaptation of A ROI Based Object Pose Estimation
Network to Monocular Fisheye Images [15.573003283204958]
We present a novel framework for adapting a ROI-based 6D object pose estimation method to work on full fisheye images.
We also contribute a fisheye image dataset, called UWHandles, with 6D object pose and 2D bounding box annotations.
arXiv Detail & Related papers (2020-02-27T19:57:33Z) - Universal Semantic Segmentation for Fisheye Urban Driving Images [6.56742346304883]
We propose a seven degrees of freedom (DoF) augmentation method to transform rectilinear image to fisheye image.
In the training process, rectilinear images are transformed into fisheye images in seven DoF, which simulates the fisheye images taken by cameras of different positions, orientations and focal lengths.
The result shows that training with the seven-DoF augmentation can improve the model's accuracy and robustness against different distorted fisheye data.
arXiv Detail & Related papers (2020-01-31T11:19:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.