ATISS: Autoregressive Transformers for Indoor Scene Synthesis
- URL: http://arxiv.org/abs/2110.03675v1
- Date: Thu, 7 Oct 2021 17:58:05 GMT
- Title: ATISS: Autoregressive Transformers for Indoor Scene Synthesis
- Authors: Despoina Paschalidou and Amlan Kar and Maria Shugrina and Karsten
Kreis and Andreas Geiger and Sanja Fidler
- Abstract summary: We present ATISS, a novel autoregressive transformer architecture for creating synthetic indoor environments.
We argue that this formulation is more natural, as it makes ATISS generally useful beyond fully automatic room layout synthesis.
Our model is trained end-to-end as an autoregressive generative model using only labeled 3D bounding boxes as supervision.
- Score: 112.63708524926689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to synthesize realistic and diverse indoor furniture layouts
automatically or based on partial input, unlocks many applications, from better
interactive 3D tools to data synthesis for training and simulation. In this
paper, we present ATISS, a novel autoregressive transformer architecture for
creating diverse and plausible synthetic indoor environments, given only the
room type and its floor plan. In contrast to prior work, which poses scene
synthesis as sequence generation, our model generates rooms as unordered sets
of objects. We argue that this formulation is more natural, as it makes ATISS
generally useful beyond fully automatic room layout synthesis. For example, the
same trained model can be used in interactive applications for general scene
completion, partial room re-arrangement with any objects specified by the user,
as well as object suggestions for any partial room. To enable this, our model
leverages the permutation equivariance of the transformer when conditioning on
the partial scene, and is trained to be permutation-invariant across object
orderings. Our model is trained end-to-end as an autoregressive generative
model using only labeled 3D bounding boxes as supervision. Evaluations on four
room types in the 3D-FRONT dataset demonstrate that our model consistently
generates plausible room layouts that are more realistic than existing methods.
In addition, it has fewer parameters, is simpler to implement and train and
runs up to 8 times faster than existing methods.
Related papers
- DeBaRA: Denoising-Based 3D Room Arrangement Generation [22.96293773013579]
We introduce DeBaRA, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment.
We demonstrate that by focusing on spatial attributes of objects, a single trained DeBaRA model can be leveraged at test time to perform several downstream applications such as scene synthesis, completion and re-arrangement.
arXiv Detail & Related papers (2024-09-26T23:18:25Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - Purposer: Putting Human Motion Generation in Context [30.706219830149504]
We present a novel method to generate human motion to populate 3D indoor scenes.
It can be controlled with various combinations of conditioning signals such as a path in a scene, target poses, past motions, and scenes represented as 3D point clouds.
arXiv Detail & Related papers (2024-04-19T15:16:04Z) - 3D scene generation from scene graphs and self-attention [51.49886604454926]
We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans.
We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene.
arXiv Detail & Related papers (2024-04-02T12:26:17Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans.
We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space.
Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.