Semantic Scene Completion with Multi-Feature Data Balancing Network
- URL: http://arxiv.org/abs/2412.01431v1
- Date: Mon, 02 Dec 2024 12:12:21 GMT
- Title: Semantic Scene Completion with Multi-Feature Data Balancing Network
- Authors: Mona Alawadh, Mahesan Niranjan, Hansung Kim,
- Abstract summary: We propose a dual-head model for RGB and depth data (F-TSDF) inputs.
Our hybrid encoder-decoder architecture with identity transformation in a pre-activation residual module effectively manages diverse signals within F-TSDF.
We evaluate RGB feature fusion strategies and use a combined loss function cross entropy for 2D RGB features and weighted cross-entropy for 3D SSC predictions.
- Score: 5.3431413737671525
- License:
- Abstract: Semantic Scene Completion (SSC) is a critical task in computer vision, that utilized in applications such as virtual reality (VR). SSC aims to construct detailed 3D models from partial views by transforming a single 2D image into a 3D representation, assigning each voxel a semantic label. The main challenge lies in completing 3D volumes with limited information, compounded by data imbalance, inter-class ambiguity, and intra-class diversity in indoor scenes. To address this, we propose the Multi-Feature Data Balancing Network (MDBNet), a dual-head model for RGB and depth data (F-TSDF) inputs. Our hybrid encoder-decoder architecture with identity transformation in a pre-activation residual module (ITRM) effectively manages diverse signals within F-TSDF. We evaluate RGB feature fusion strategies and use a combined loss function cross entropy for 2D RGB features and weighted cross-entropy for 3D SSC predictions. MDBNet results surpass comparable state-of-the-art (SOTA) methods on NYU datasets, demonstrating the effectiveness of our approach.
Related papers
- Refine3DNet: Scaling Precision in 3D Object Reconstruction from Multi-View RGB Images using Attention [2.037112541541094]
We introduce a hybrid strategy featuring a visual auto-encoder with self-attention mechanisms and a 3D refiner network.
Our approach, combined with JTSO, outperforms state-of-the-art techniques in single and multi-view 3D reconstruction.
arXiv Detail & Related papers (2024-12-01T08:53:39Z) - Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Towards Balanced RGB-TSDF Fusion for Consistent Semantic Scene Completion by 3D RGB Feature Completion and a Classwise Entropy Loss Function [10.22925811541619]
RGB-TSDF fusion has been considered nontrivial and commonly-used naive addition will result in inconsistent results.
We propose a two-stage network with a 3D RGB feature completion module that completes RGB features with meaningful values for occluded areas.
arXiv Detail & Related papers (2024-03-25T15:56:51Z) - Large Generative Model Assisted 3D Semantic Communication [51.17527319441436]
We propose a Generative AI Model assisted 3D SC (GAM-3DSC) system.
First, we introduce a 3D Semantic Extractor (3DSE) to extract key semantics from a 3D scenario based on user requirements.
We then present an Adaptive Semantic Compression Model (ASCM) for encoding these multi-perspective images.
Finally, we design a conditional Generative adversarial network and Diffusion model aided-Channel Estimation (GDCE) to estimate and refine the Channel State Information (CSI) of physical channels.
arXiv Detail & Related papers (2024-03-09T03:33:07Z) - MMRDN: Consistent Representation for Multi-View Manipulation
Relationship Detection in Object-Stacked Scenes [62.20046129613934]
We propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN)
We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions.
We select a set of $K$ Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects.
arXiv Detail & Related papers (2023-04-25T05:55:29Z) - Data Augmented 3D Semantic Scene Completion with 2D Segmentation Priors [1.0973642726108543]
We present SPAwN, a novel lightweight multimodal 3D deep CNN.
A crucial difficulty in this field is the lack of fully labeled real-world 3D datasets.
We introduce the use of a 3D data augmentation strategy that can be applied to multimodal SSC networks.
arXiv Detail & Related papers (2021-11-26T04:08:34Z) - Similarity-Aware Fusion Network for 3D Semantic Segmentation [87.51314162700315]
We propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation.
We employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds.
We show that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.
arXiv Detail & Related papers (2021-07-04T09:28:18Z) - S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point
Clouds [0.16799377888527683]
We present S3CNet, a sparse convolution based neural network that predicts the semantically completed scene from a single, unified LiDAR point cloud.
We show that our proposed method outperforms all counterparts on the 3D task, achieving state-of-the art results on the Semantic KITTI benchmark.
arXiv Detail & Related papers (2020-12-16T20:14:41Z) - Spatial Information Guided Convolution for Real-Time RGBD Semantic
Segmentation [79.78416804260668]
We propose Spatial information guided Convolution (S-Conv), which allows efficient RGB feature and 3D spatial information integration.
S-Conv is competent to infer the sampling offset of the convolution kernel guided by the 3D spatial information.
We further embed S-Conv into a semantic segmentation network, called Spatial information Guided convolutional Network (SGNet)
arXiv Detail & Related papers (2020-04-09T13:38:05Z) - Attention-based Multi-modal Fusion Network for Semantic Scene Completion [35.93265545962268]
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task.
Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously.
It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks.
arXiv Detail & Related papers (2020-03-31T02:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.