Unveiling Spaces: Architecturally meaningful semantic descriptions from
images of interior spaces
- URL: http://arxiv.org/abs/2312.12481v1
- Date: Tue, 19 Dec 2023 16:03:04 GMT
- Title: Unveiling Spaces: Architecturally meaningful semantic descriptions from
images of interior spaces
- Authors: Demircan Tas, Rohit Priyadarshi Sanatani
- Abstract summary: This project aims to tackle the problem of extracting architecturally meaningful semantic descriptions from two-dimensional scenes of populated interior spaces.
A Generative Adversarial Network (GAN) for image-to-image translation (Pix2Pix) is trained on synthetically generated rendered images of these enclosures, along with corresponding image abstractions representing high-level architectural structure.
A similar model evaluation is also carried out on photographs of existing indoor enclosures, to measure its performance in real-world settings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There has been a growing adoption of computer vision tools and technologies
in architectural design workflows over the past decade. Notable use cases
include point cloud generation, visual content analysis, and spatial awareness
for robotic fabrication. Multiple image classification, object detection, and
semantic pixel segmentation models have become popular for the extraction of
high-level symbolic descriptions and semantic content from two-dimensional
images and videos. However, a major challenge in this regard has been the
extraction of high-level architectural structures (walls, floors, ceilings
windows etc.) from diverse imagery where parts of these elements are occluded
by furniture, people, or other non-architectural elements. This project aims to
tackle this problem by proposing models that are capable of extracting
architecturally meaningful semantic descriptions from two-dimensional scenes of
populated interior spaces. 1000 virtual classrooms are parametrically
generated, randomized along key spatial parameters such as length, width,
height, and door/window positions. The positions of cameras, and
non-architectural visual obstructions (furniture/objects) are also randomized.
A Generative Adversarial Network (GAN) for image-to-image translation (Pix2Pix)
is trained on synthetically generated rendered images of these enclosures,
along with corresponding image abstractions representing high-level
architectural structure. The model is then tested on unseen synthetic imagery
of new enclosures, and outputs are compared to ground truth using pixel-wise
comparison for evaluation. A similar model evaluation is also carried out on
photographs of existing indoor enclosures, to measure its performance in
real-world settings.
Related papers
- HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections [19.05215193265488]
We present a localization system that connects neural representations of scenes depicting large-scale landmarks with text describing a semantic region within the scene.
Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts.
Our results show that HaLo-NeRF can accurately localize a variety of semantic concepts related to architectural landmarks.
arXiv Detail & Related papers (2024-02-14T14:02:04Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Hand-Object Interaction Image Generation [135.87707468156057]
This work is dedicated to a new task, i.e., hand-object interaction image generation.
It aims to conditionally generate the hand-object image under the given hand, object and their interaction status.
This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping.
arXiv Detail & Related papers (2022-11-28T18:59:57Z) - Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z) - Neural Scene Decoration from a Single Photograph [24.794743085391953]
We introduce a new problem of domain-specific image synthesis using generative modeling, namely neural scene decoration.
Given a photograph of an empty indoor space, we aim to synthesize a new image of the same space that is fully furnished and decorated.
Our network contains a novel image generator that transforms an initial point-based object layout into a realistic photograph.
arXiv Detail & Related papers (2021-08-04T01:44:21Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z) - Self-Supervised Annotation of Seismic Images using Latent Space
Factorization [14.221460375400692]
Our framework factorizes the latent space of a deep encoder-decoder network by projecting the latent space to learned sub-spaces.
Details of the annotated image are provided for analysis and qualitative comparison is made with similar frameworks.
arXiv Detail & Related papers (2020-09-10T01:54:45Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z) - Shallow2Deep: Indoor Scene Modeling by Single Image Understanding [42.87957414916607]
We present an automatic indoor scene modeling approach using deep features from neural networks.
Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship.
arXiv Detail & Related papers (2020-02-22T23:27:22Z) - Seeing the World in a Bag of Chips [73.561388215585]
We address the dual problems of novel view synthesis and environment reconstruction from hand-held RGBD sensors.
Our contributions include 1) modeling highly specular objects, 2) modeling inter-reflections and Fresnel effects, and 3) enabling surface light field reconstruction with the same input needed to reconstruct shape alone.
arXiv Detail & Related papers (2020-01-14T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.