Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments
- URL: http://arxiv.org/abs/2207.04526v1
- Date: Sun, 10 Jul 2022 20:03:38 GMT
- Title: Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments
- Authors: Daniel Seichter, S\"ohnke Benedikt Fischedick, Mona K\"ohler,
Horst-Michael Gro{\ss}
- Abstract summary: We propose an efficient multi-task approach for RGB-D scene analysis(EMSANet)
We show that all tasks can be accomplished using a single neural network in real time on a mobile platform without diminishing performance.
We are the first to provide results in such a comprehensive multi-task setting for indoor scene analysis on NYUv2 and SUNRGB-D.
- Score: 13.274695420192884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic scene understanding is essential for mobile agents acting in various
environments. Although semantic segmentation already provides a lot of
information, details about individual objects as well as the general scene are
missing but required for many real-world applications. However, solving
multiple tasks separately is expensive and cannot be accomplished in real time
given limited computing and battery capabilities on a mobile platform. In this
paper, we propose an efficient multi-task approach for RGB-D scene
analysis~(EMSANet) that simultaneously performs semantic and instance
segmentation~(panoptic segmentation), instance orientation estimation, and
scene classification. We show that all tasks can be accomplished using a single
neural network in real time on a mobile platform without diminishing
performance - by contrast, the individual tasks are able to benefit from each
other. In order to evaluate our multi-task approach, we extend the annotations
of the common RGB-D indoor datasets NYUv2 and SUNRGB-D for instance
segmentation and orientation estimation. To the best of our knowledge, we are
the first to provide results in such a comprehensive multi-task setting for
indoor scene analysis on NYUv2 and SUNRGB-D.
Related papers
- Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
arXiv Detail & Related papers (2023-10-31T20:15:40Z) - Efficient Multi-Task Scene Analysis with RGB-D Transformers [7.9011213682805215]
We introduce an efficient multi-task scene analysis approach, called EMSAFormer, that uses an RGB-D Transformer-based encoder to simultaneously perform the aforementioned tasks.
Our approach achieves state-of-the-art performance while still enabling inference with up to 39.1 FPS on an NVIDIA Jetson AGX Orin 32 GB.
arXiv Detail & Related papers (2023-06-08T14:41:56Z) - A Dynamic Feature Interaction Framework for Multi-task Visual Perception [100.98434079696268]
We devise an efficient unified framework to solve multiple common perception tasks.
These tasks include instance segmentation, semantic segmentation, monocular 3D detection, and depth estimation.
Our proposed framework, termed D2BNet, demonstrates a unique approach to parameter-efficient predictions for multi-task perception.
arXiv Detail & Related papers (2023-06-08T09:24:46Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - TarViS: A Unified Approach for Target-based Video Segmentation [115.5770357189209]
TarViS is a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined 'targets' in video.
Our approach is flexible with respect to how tasks define these targets, since it models the latter as abstract 'queries' which are then used to predict pixel-precise target masks.
To demonstrate its effectiveness, we apply TarViS to four different tasks, namely Video Instance (VIS), Video Panoptic (VPS), Video Object (VOS) and Point Exemplar-guided Tracking (PET)
arXiv Detail & Related papers (2023-01-06T18:59:52Z) - PanDepth: Joint Panoptic Segmentation and Depth Completion [19.642115764441016]
We propose a multi-task model for panoptic segmentation and depth completion using RGB images and sparse depth maps.
Our model successfully predicts fully dense depth maps and performs semantic segmentation, instance segmentation, and panoptic segmentation for every input frame.
arXiv Detail & Related papers (2022-12-29T05:37:38Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z) - STEP: Segmenting and Tracking Every Pixel [107.23184053133636]
We present a new benchmark: Segmenting and Tracking Every Pixel (STEP)
Our work is the first that targets this task in a real-world setting that requires dense interpretation in both spatial and temporal domains.
For measuring the performance, we propose a novel evaluation metric and Tracking Quality (STQ)
arXiv Detail & Related papers (2021-02-23T18:43:02Z) - Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis [16.5390740005143]
We propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIART.
We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed.
arXiv Detail & Related papers (2020-11-13T15:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.