PlaneSegNet: Fast and Robust Plane Estimation Using a Single-stage
Instance Segmentation CNN
- URL: http://arxiv.org/abs/2103.15428v1
- Date: Mon, 29 Mar 2021 08:53:05 GMT
- Title: PlaneSegNet: Fast and Robust Plane Estimation Using a Single-stage
Instance Segmentation CNN
- Authors: Yaxu Xie, Jason Rambach, Fangwen Shu, Didier Stricker
- Abstract summary: We propose a real-time deep neural architecture that estimates piece-wise planar regions from a single RGB image.
Our method achieves significantly higher frame-rates and comparable segmentation accuracy against two-stage methods.
- Score: 12.251947429149796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instance segmentation of planar regions in indoor scenes benefits visual SLAM
and other applications such as augmented reality (AR) where scene understanding
is required. Existing methods built upon two-stage frameworks show satisfactory
accuracy but are limited by low frame rates. In this work, we propose a
real-time deep neural architecture that estimates piece-wise planar regions
from a single RGB image. Our model employs a variant of a fast single-stage CNN
architecture to segment plane instances. Considering the particularity of the
target detected, we propose Fast Feature Non-maximum Suppression (FF-NMS) to
reduce the suppression errors resulted from overlapping bounding boxes of
planes. We also utilize a Residual Feature Augmentation module in the Feature
Pyramid Network (FPN). Our method achieves significantly higher frame-rates and
comparable segmentation accuracy against two-stage methods. We automatically
label over 70,000 images as ground truth from the Stanford 2D-3D-Semantics
dataset. Moreover, we incorporate our method with a state-of-the-art planar
SLAM and validate its benefits.
Related papers
- ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - FMapping: Factorized Efficient Neural Field Mapping for Real-Time Dense
RGB SLAM [3.6985351289638957]
We introduce FMapping, an efficient neural field mapping framework that facilitates the continuous estimation of a colorized point cloud map in real-time dense RGB SLAM.
We propose an effective factorization scheme for scene representation and introduce a sliding window strategy to reduce the uncertainty for scene reconstruction.
arXiv Detail & Related papers (2023-06-01T11:51:46Z) - Implicit Temporal Modeling with Learnable Alignment for Video
Recognition [95.82093301212964]
We propose a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance.
ILA achieves a top-1 accuracy of 88.7% on Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H.
arXiv Detail & Related papers (2023-04-20T17:11:01Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Approximated Bilinear Modules for Temporal Modeling [116.6506871576514]
Two-layers in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch sampling.
Our models can outperform most state-of-the-art methods on SomethingSomething v1 and v2 datasets without pretraining.
arXiv Detail & Related papers (2020-07-25T09:07:35Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale
Context Aggregation and Feature Space Super-resolution [14.226301825772174]
We introduce a novel and efficient module called Cascaded Factorized Atrous Spatial Pyramid Pooling (CF-ASPP)
It is a lightweight cascaded structure for Convolutional Neural Networks (CNNs) to efficiently leverage context information.
We achieve 68.4% mIoU at 84 fps on the Cityscapes test set with a single Nivida Titan X (Maxwell) GPU card.
arXiv Detail & Related papers (2020-03-09T03:53:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.