A Deep Learning-based Global and Segmentation-based Semantic Feature
Fusion Approach for Indoor Scene Classification
- URL: http://arxiv.org/abs/2302.06432v3
- Date: Wed, 31 Jan 2024 16:50:58 GMT
- Title: A Deep Learning-based Global and Segmentation-based Semantic Feature
Fusion Approach for Indoor Scene Classification
- Authors: Ricardo Pereira, Tiago Barros, Luis Garrote, Ana Lopes, Urbano J.
Nunes
- Abstract summary: This work proposes a novel approach that uses a semantic segmentation mask to obtain a 2D spatial layout of the segmentation-categories across the scene.
A two-branch network, GS2F2App, exploits CNN-based global features extracted from RGB images and the segmentation-based features extracted from the proposed SSFs.
- Score: 0.6578923037198714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work proposes a novel approach that uses a semantic segmentation mask to
obtain a 2D spatial layout of the segmentation-categories across the scene,
designated by segmentation-based semantic features (SSFs). These features
represent, per segmentation-category, the pixel count, as well as the 2D
average position and respective standard deviation values. Moreover, a
two-branch network, GS2F2App, that exploits CNN-based global features extracted
from RGB images and the segmentation-based features extracted from the proposed
SSFs, is also proposed. GS2F2App was evaluated in two indoor scene benchmark
datasets: the SUN RGB-D and the NYU Depth V2, achieving state-of-the-art
results on both datasets.
Related papers
- SAM-OCTA2: Layer Sequence OCTA Segmentation with Fine-tuned Segment Anything Model 2 [2.314516220934268]
Low-rank adaptation technique is adopted to fine-tune the Segment Anything Model (SAM) version 2.
Method is named SAM- OCTA2 and has been experimented on the OCTA-500 dataset.
It achieves state-of-the-art performance in segmenting the foveal avascular zone (FAZ) on regular 2D en-face and effectively tracks local vessels across scanning layer sequences.
arXiv Detail & Related papers (2024-09-14T03:28:24Z) - Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification [0.5572976467442564]
The work described in this paper uses both semantic information, obtained from object detection, and semantic segmentation techniques.
A novel approach that uses a semantic segmentation mask to provide Hu-moments-based segmentation categories' shape characterization, designated by Hu-Moments Features (SHMFs) is proposed.
A three-main-branch network, designated by GOS$2$F$2$App, that exploits deep-learning-based global features, object-based features, and semantic segmentation-based features is also proposed.
arXiv Detail & Related papers (2024-04-11T13:37:51Z) - GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding [101.32590239809113]
Generalized Perception NeRF (GP-NeRF) is a novel pipeline that makes the widely used segmentation model and NeRF work compatibly under a unified framework.
We propose two self-distillation mechanisms, i.e., the Semantic Distill Loss and the Depth-Guided Semantic Distill Loss, to enhance the discrimination and quality of the semantic field.
arXiv Detail & Related papers (2023-11-20T15:59:41Z) - Interactive Segmentation as Gaussian Process Classification [58.44673380545409]
Click-based interactive segmentation (IS) aims to extract the target objects under user interaction.
Most of the current deep learning (DL)-based methods mainly follow the general pipelines of semantic segmentation.
We propose to formulate the IS task as a Gaussian process (GP)-based pixel-wise binary classification model on each image.
arXiv Detail & Related papers (2023-02-28T14:01:01Z) - Geometry-Aware Network for Domain Adaptive Semantic Segmentation [64.00345743710653]
We propose a novel Geometry-Aware Network for Domain Adaptation (GANDA) to shrink the domain gaps.
We exploit 3D topology on the point clouds generated from RGB-D images for coordinate-color disentanglement and pseudo-labels refinement in the target domain.
Our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.
arXiv Detail & Related papers (2022-12-02T00:48:44Z) - NeRF-SOS: Any-View Self-supervised Object Segmentation from Complex
Real-World Scenes [80.59831861186227]
This paper carries out the exploration of self-supervised learning for object segmentation using NeRF for complex real-world scenes.
Our framework, called NeRF with Self-supervised Object NeRF-SOS, encourages NeRF models to distill compact geometry-aware segmentation clusters.
It consistently surpasses other 2D-based self-supervised baselines and predicts finer semantics masks than existing supervised counterparts.
arXiv Detail & Related papers (2022-09-19T06:03:17Z) - SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation [94.11915008006483]
We propose SemAffiNet for point cloud semantic segmentation.
We conduct extensive experiments on the ScanNetV2 and NYUv2 datasets.
arXiv Detail & Related papers (2022-05-26T17:00:23Z) - Semantic Dense Reconstruction with Consistent Scene Segments [33.0310121044956]
A method for dense semantic 3D scene reconstruction from an RGB-D sequence is proposed to solve high-level scene understanding tasks.
First, each RGB-D pair is consistently segmented into 2D semantic maps based on a camera tracking backbone.
A dense 3D mesh model of an unknown environment is incrementally generated from the input RGB-D sequence.
arXiv Detail & Related papers (2021-09-30T03:01:17Z) - Improving Semi-Supervised and Domain-Adaptive Semantic Segmentation with
Self-Supervised Depth Estimation [94.16816278191477]
We present a framework for semi-adaptive and domain-supervised semantic segmentation.
It is enhanced by self-supervised monocular depth estimation trained only on unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset.
arXiv Detail & Related papers (2021-08-28T01:33:38Z) - Attention-based Multi-modal Fusion Network for Semantic Scene Completion [35.93265545962268]
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task.
Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously.
It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks.
arXiv Detail & Related papers (2020-03-31T02:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.