Shape-Net: Room Layout Estimation from Panoramic Images Robust to
Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs
- URL: http://arxiv.org/abs/2304.12624v1
- Date: Tue, 25 Apr 2023 07:45:43 GMT
- Title: Shape-Net: Room Layout Estimation from Panoramic Images Robust to
Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs
- Authors: Mizuki Tabata, Kana Kurata, Junichiro Tamamatsu
- Abstract summary: We propose a method for distilling knowledge from a model trained with both images and 3D information to a model that takes only images as input.
The proposed model, which is called Shape-Net, achieves state-of-the-art (SOTA) performance on benchmark datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating the layout of a room from a single-shot panoramic image is
important in virtual/augmented reality and furniture layout simulation. This
involves identifying three-dimensional (3D) geometry, such as the location of
corners and boundaries, and performing 3D reconstruction. However, occlusion is
a common issue that can negatively impact room layout estimation, and this has
not been thoroughly studied to date. It is possible to obtain 3D shape
information of rooms as drawings of buildings and coordinates of corners from
image datasets, thus we propose providing both 2D panoramic and 3D information
to a model to effectively deal with occlusion. However, simply feeding 3D
information to a model is not sufficient to utilize the shape information for
an occluded area. Therefore, we improve the model by introducing 3D
Intersection over Union (IoU) loss to effectively use 3D information. In some
cases, drawings are not available or the construction deviates from a drawing.
Considering such practical cases, we propose a method for distilling knowledge
from a model trained with both images and 3D information to a model that takes
only images as input. The proposed model, which is called Shape-Net, achieves
state-of-the-art (SOTA) performance on benchmark datasets. We also confirmed
its effectiveness in dealing with occlusion through significantly improved
accuracy on images with occlusion compared with existing models.
Related papers
- ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Disentangled3D: Learning a 3D Generative Model with Disentangled
Geometry and Appearance from Monocular Images [94.49117671450531]
State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis.
In this paper, we design a 3D GAN which can learn a disentangled model of objects, just from monocular observations.
arXiv Detail & Related papers (2022-03-29T22:03:18Z) - Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape
Laplacian [58.704089101826774]
We present a 3D-aware image deformation method with minimal restrictions on shape category and deformation type.
We take a supervised learning-based approach to predict the shape Laplacian of the underlying volume of a 3D reconstruction represented as a point cloud.
In the experiments, we present our results of deforming 2D character and clothed human images.
arXiv Detail & Related papers (2022-03-29T04:57:18Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - Fully Understanding Generic Objects: Modeling, Segmentation, and
Reconstruction [33.95791350070165]
Inferring 3D structure of a generic object from a 2D image is a long-standing objective of computer vision.
We take an alternative approach with semi-supervised learning. That is, for a 2D image of a generic object, we decompose it into latent representations of category, shape and albedo.
We show that the complete shape and albedo modeling enables us to leverage real 2D images in both modeling and model fitting.
arXiv Detail & Related papers (2021-04-02T02:39:29Z) - An Effective Loss Function for Generating 3D Models from Single 2D Image
without Rendering [0.0]
Differentiable rendering is a very successful technique that applies to a Single-View 3D Reconstruction.
Currents use losses based on pixels between a rendered image of some 3D reconstructed object and ground-truth images from given matched viewpoints to optimise parameters of the 3D shape.
We propose a novel effective loss function that evaluates how well the projections of reconstructed 3D point clouds cover the ground truth object's silhouette.
arXiv Detail & Related papers (2021-03-05T00:02:18Z) - Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic
Segmentation [87.54570024320354]
State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space.
A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space.
We develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds.
arXiv Detail & Related papers (2020-08-04T13:56:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.