Related papers: Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis

Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis

URL: http://arxiv.org/abs/2011.06961v3
Date: Wed, 7 Apr 2021 14:41:24 GMT
Title: Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis
Authors: Daniel Seichter, Mona K\"ohler, Benjamin Lewandowski, Tim Wengefeld and Horst-Michael Gross
Abstract summary: We propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIART. We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed.
Score: 16.5390740005143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Analyzing scenes thoroughly is crucial for mobile robots acting in different environments. Semantic segmentation can enhance various subsequent tasks, such as (semantically assisted) person perception, (semantic) free space detection, (semantic) mapping, and (semantic) navigation. In this paper, we propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIA TensorRT and, thus, is well suited as a common initial processing step in a complex system for scene analysis on mobile robots. We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed. We evaluate our proposed Efficient Scene Analysis Network (ESANet) on the common indoor datasets NYUv2 and SUNRGB-D and show that we reach state-of-the-art performance while enabling faster inference. Furthermore, our evaluation on the outdoor dataset Cityscapes shows that our approach is suitable for other areas of application as well. Finally, instead of presenting benchmark results only, we also show qualitative results in one of our indoor application scenarios.

Related papers

CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image [86.75098349480014]
This paper tackles category-level pose estimation of articulated objects in robotic manipulation tasks. We propose a single-stage Network, CAP-Net, for estimating the 6D poses and sizes of Categorical Articulated Parts. We introduce the RGBD-Art dataset, the largest RGB-D articulated dataset to date, featuring RGB images and depth noise simulated from real sensors.
arXiv Detail & Related papers (2025-04-15T14:30:26Z)
IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks [4.3266254914862445]
RGB-D segmentation promises richer scene understanding than RGB-only methods. There is a relative scarcity of instance-level RGB-D segmentation datasets. We introduce three RGB-D instance segmentation benchmarks, distinguished at the instance level. We propose a simple yet effective method for RGB-D data integration.
arXiv Detail & Related papers (2025-01-03T08:03:24Z)
Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem. We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z)
Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data. We propose an augmentation technique called "Organ Transplantation" to enhance generalizability. Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z)
EventTransAct: A video transformer-based framework for Event-camera based action recognition [52.537021302246664]
Event cameras offer new opportunities compared to standard action recognition in RGB videos. In this study, we employ a computationally efficient model, namely the video transformer network (VTN), which initially acquires spatial embeddings per event-frame. In order to better adopt the VTN for the sparse and fine-grained nature of event data, we design Event-Contrastive Loss ($mathcalL_EC$) and event-specific augmentations.
arXiv Detail & Related papers (2023-08-25T23:51:07Z)
Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation [9.198120596225968]
We propose an efficient lightweight encoder-decoder network that reduces the computational parameters and guarantees the robustness of the algorithm. Experimental results on NYUv2, SUN RGB-D, and Cityscapes datasets show that our method achieves a better trade-off among segmentation accuracy, inference time, and parameters than the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-11T09:02:03Z)
Efficient Multi-Task Scene Analysis with RGB-D Transformers [7.9011213682805215]
We introduce an efficient multi-task scene analysis approach, called EMSAFormer, that uses an RGB-D Transformer-based encoder to simultaneously perform the aforementioned tasks. Our approach achieves state-of-the-art performance while still enabling inference with up to 39.1 FPS on an NVIDIA Jetson AGX Orin 32 GB.
arXiv Detail & Related papers (2023-06-08T14:41:56Z)
A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z)
Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments [13.274695420192884]
We propose an efficient multi-task approach for RGB-D scene analysis(EMSANet) We show that all tasks can be accomplished using a single neural network in real time on a mobile platform without diminishing performance. We are the first to provide results in such a comprehensive multi-task setting for indoor scene analysis on NYUv2 and SUNRGB-D.
arXiv Detail & Related papers (2022-07-10T20:03:38Z)
Unseen Object Instance Segmentation with Fully Test-time RGB-D Embeddings Adaptation [14.258456366985444]
Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios. We re-emphasize the adaptation process across Sim2Real domains in this paper. We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
arXiv Detail & Related papers (2022-04-21T02:35:20Z)
Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion. In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism. We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.