Learning Object Placement Programs for Indoor Scene Synthesis with Iterative Self Training
- URL: http://arxiv.org/abs/2503.04496v1
- Date: Thu, 06 Mar 2025 14:44:25 GMT
- Title: Learning Object Placement Programs for Indoor Scene Synthesis with Iterative Self Training
- Authors: Adrian Chang, Kai Wang, Yuanbo Li, Manolis Savva, Angel X. Chang, Daniel Ritchie,
- Abstract summary: Data driven and autoregressive indoor scene systems generate scenes automatically by suggesting and then placing objects one at a time.<n>We design a Domain Specific Language that specifies functional constraints.<n>We build upon previous work in unsupervised program induction to introduce a new program bootstrapping algorithm.<n>Our system also generates indoor scenes of comparable quality to previous systems and while previous systems degrade in performance when training data is sparse, our system does not degrade to the same degree.
- Score: 27.788560122097792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data driven and autoregressive indoor scene synthesis systems generate indoor scenes automatically by suggesting and then placing objects one at a time. Empirical observations show that current systems tend to produce incomplete next object location distributions. We introduce a system which addresses this problem. We design a Domain Specific Language (DSL) that specifies functional constraints. Programs from our language take as input a partial scene and object to place. Upon execution they predict possible object placements. We design a generative model which writes these programs automatically. Available 3D scene datasets do not contain programs to train on, so we build upon previous work in unsupervised program induction to introduce a new program bootstrapping algorithm. In order to quantify our empirical observations we introduce a new evaluation procedure which captures how well a system models per-object location distributions. We ask human annotators to label all the possible places an object can go in a scene and show that our system produces per-object location distributions more consistent with human annotators. Our system also generates indoor scenes of comparable quality to previous systems and while previous systems degrade in performance when training data is sparse, our system does not degrade to the same degree.
Related papers
- Grasping Partially Occluded Objects Using Autoencoder-Based Point Cloud Inpainting [50.4653584592824]
Real-world applications often come with challenges that might not be considered in grasping solutions tested in simulation or lab settings.
In this paper, we present an algorithm to reconstruct the missing information.
Our inpainting solution facilitates the real-world utilization of robust object matching approaches for grasping point calculation.
arXiv Detail & Related papers (2025-03-16T15:38:08Z) - ROOT: VLM based System for Indoor Scene Understanding and Beyond [83.71252153660078]
ROOT is a VLM-based system designed to enhance the analysis of indoor scenes.
rootname facilitates indoor scene understanding and proves effective in diverse downstream applications, such as 3D scene generation and embodied AI.
arXiv Detail & Related papers (2024-11-24T04:51:24Z) - Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects [0.94371657253557]
This survey focuses on techniques that leverage machine learning, deep learning, embedded systems, and natural language processing (NLP)
We categorize the models into four main types: Variational Autoencoders (VAEs), Generative Adrial Networks (GANs), Transformers, and Diffusion Models.
We also review the most commonly used datasets, such as COCO-Stuff, Visual Genome, and MS-COCO, which are critical for training and evaluating these models.
arXiv Detail & Related papers (2024-09-14T19:09:10Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model designed to synthesize plausible 3D indoor scenes.<n>We show it outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases [13.126239167800652]
We present a system for generating indoor scenes in response to text prompts.
The prompts are not limited to a fixed vocabulary of scene descriptions.
The objects in generated scenes are not restricted to a fixed set of object categories.
arXiv Detail & Related papers (2024-02-05T01:59:31Z) - Context-Aware Indoor Point Cloud Object Generation through User Instructions [6.398660996031915]
We present a novel end-to-end multi-modal deep neural network capable of generating point cloud objects seamlessly integrated with their surroundings.
Our model revolutionizes scene modification by enabling the creation of new environments with previously unseen object layouts.
arXiv Detail & Related papers (2023-11-26T06:40:16Z) - Variational Inference for Scalable 3D Object-centric Learning [19.445804699433353]
We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes.
Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes.
We propose to learn view-invariant 3D object representations in localized object coordinate systems.
arXiv Detail & Related papers (2023-09-25T10:23:40Z) - Stereo Neural Vernier Caliper [57.187088191829886]
We propose a new object-centric framework for learning-based stereo 3D object detection.
We tackle a problem of how to predict a refined update given an initial 3D cuboid guess.
Our approach achieves state-of-the-art performance on the KITTI benchmark.
arXiv Detail & Related papers (2022-03-21T14:36:07Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z) - Rearrangement: A Challenge for Embodied AI [229.8891614821016]
We describe a framework for research and evaluation in Embodied AI.
Our proposal is based on a canonical task: Rearrangement.
We present experimental testbeds of rearrangement scenarios in four different simulation environments.
arXiv Detail & Related papers (2020-11-03T19:42:32Z) - Scenic: A Language for Scenario Specification and Data Generation [17.07493567658614]
We propose a new probabilistic programming language for the design and analysis of cyber-physical systems.
In this paper, we focus on systems like autonomous cars and robots, whose environment at any point in time is a'scene'
We design a domain-specific language, Scenic, for describing scenarios that are distributions over scenes and the behaviors of their agents over time.
arXiv Detail & Related papers (2020-10-13T17:58:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.