Related papers: Context-Aware Indoor Point Cloud Object Generation through User Instructions

Context-Aware Indoor Point Cloud Object Generation through User Instructions

URL: http://arxiv.org/abs/2311.16501v2
Date: Mon, 22 Jul 2024 14:52:46 GMT
Title: Context-Aware Indoor Point Cloud Object Generation through User Instructions
Authors: Yiyang Luo, Ke Lin, Chao Gu,
Abstract summary: We present a novel end-to-end multi-modal deep neural network capable of generating point cloud objects seamlessly integrated with their surroundings. Our model revolutionizes scene modification by enabling the creation of new environments with previously unseen object layouts.
Score: 6.398660996031915
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Indoor scene modification has emerged as a prominent area within computer vision, particularly for its applications in Augmented Reality (AR) and Virtual Reality (VR). Traditional methods often rely on pre-existing object databases and predetermined object positions, limiting their flexibility and adaptability to new scenarios. In response to this challenge, we present a novel end-to-end multi-modal deep neural network capable of generating point cloud objects seamlessly integrated with their surroundings, driven by textual instructions. Our model revolutionizes scene modification by enabling the creation of new environments with previously unseen object layouts, eliminating the need for pre-stored CAD models. Leveraging Point-E as our generative model, we introduce innovative techniques such as quantized position prediction and Top-K estimation to address the issue of false negatives resulting from ambiguous language descriptions. Furthermore, we conduct comprehensive evaluations to showcase the diversity of generated objects, the efficacy of textual instructions, and the quantitative metrics, affirming the realism and versatility of our model in generating indoor objects. To provide a holistic assessment, we incorporate visual grounding as an additional metric, ensuring the quality and coherence of the scenes produced by our model. Through these advancements, our approach not only advances the state-of-the-art in indoor scene modification but also lays the foundation for future innovations in immersive computing and digital environment creation.

Related papers

Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation [75.30238170051291]
Depth estimation is a fundamental task in 3D computer vision, crucial for applications such as 3D reconstruction, free-viewpoint rendering, robotics, autonomous driving, and AR/VR technologies.<n>Traditional methods relying on hardware sensors like LiDAR are often limited by high costs, low resolution, and environmental sensitivity, limiting their applicability in real-world scenarios.<n>Recent advances in vision-based methods offer a promising alternative, yet they face challenges in generalization and stability due to either the low-capacity model architectures or the reliance on domain-specific and small-scale datasets.
arXiv Detail & Related papers (2025-07-15T17:59:59Z)
Online 3D Scene Reconstruction Using Neural Object Priors [83.14204014687938]
This paper addresses the problem of reconstructing a scene online at the level of objects given an RGB-D video sequence. We propose a feature grid mechanism to continuously update object-centric neural implicit representations as new object parts are revealed. Our approach outperforms state-of-the-art neural implicit models for this task in terms of reconstruction accuracy and completeness.
arXiv Detail & Related papers (2025-03-24T17:09:36Z)
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes [35.16430027877207]
MOVIS aims to enhance the structural awareness of the view-conditioned diffusion model for multi-object NVS. We introduce an auxiliary task requiring the model to simultaneously predict novel view object masks. Our method exhibits strong generalization capabilities and produces consistent novel view synthesis.
arXiv Detail & Related papers (2024-12-16T05:23:45Z)
DeBaRA: Denoising-Based 3D Room Arrangement Generation [22.96293773013579]
We introduce DeBaRA, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment. We demonstrate that by focusing on spatial attributes of objects, a single trained DeBaRA model can be leveraged at test time to perform several downstream applications such as scene synthesis, completion and re-arrangement.
arXiv Detail & Related papers (2024-09-26T23:18:25Z)
Automatic Scene Generation: State-of-the-Art Techniques, Models, Datasets, Challenges, and Future Prospects [0.94371657253557]
This survey focuses on techniques that leverage machine learning, deep learning, embedded systems, and natural language processing (NLP) We categorize the models into four main types: Variational Autoencoders (VAEs), Generative Adrial Networks (GANs), Transformers, and Diffusion Models. We also review the most commonly used datasets, such as COCO-Stuff, Visual Genome, and MS-COCO, which are critical for training and evaluating these models.
arXiv Detail & Related papers (2024-09-14T19:09:10Z)
Real-Time Indoor Object Detection based on hybrid CNN-Transformer Approach [0.0]
Real-time object detection in indoor settings is a challenging area of computer vision, faced with unique obstacles such as variable lighting and complex backgrounds. This study delves into the evaluation of existing datasets and computational models, leading to the creation of a refined dataset. We present an adaptation of a CNN detection model, incorporating an attention mechanism to enhance the model's ability to discern and prioritize critical features within cluttered indoor scenes.
arXiv Detail & Related papers (2024-09-03T13:14:08Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models. BVS supports a large number of adjustable parameters at the scene level. We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z)
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange [50.45953583802282]
We introduce a novel self-supervised learning (SSL) strategy for point cloud scene understanding. Our approach leverages both object patterns and contextual cues to produce robust features. Our experiments demonstrate the superiority of our method over existing SSL techniques.
arXiv Detail & Related papers (2024-04-11T06:39:53Z)
UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling [7.626461564400769]
We propose a novel SLAM backend that unifies ego-motion tracking, rigid object motion tracking, and modeling. Our system showcases the potential application of object perception in complex dynamic scenes.
arXiv Detail & Related papers (2023-09-29T07:50:09Z)
CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote Aggregation [67.12857074801731]
We introduce a novel method, CPPF++, designed for sim-to-real pose estimation. To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty. We incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a feature ensemble.
arXiv Detail & Related papers (2022-11-24T03:27:00Z)
Temporal Predictive Coding For Model-Based Planning In Latent Space [80.99554006174093]
We present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time. We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task.
arXiv Detail & Related papers (2021-06-14T04:31:15Z)
SceneGen: Generative Contextual Scene Augmentation using Scene Graph Priors [3.1969855247377827]
We introduce SceneGen, a generative contextual augmentation framework that predicts virtual object positions and orientations within existing scenes. SceneGen takes a semantically segmented scene as input, and outputs positional and orientational probability maps for placing virtual content. We formulate a novel spatial Scene Graph representation, which encapsulates explicit topological properties between objects, object groups, and the room. To demonstrate our system in action, we develop an Augmented Reality application, in which objects can be contextually augmented in real-time.
arXiv Detail & Related papers (2020-09-25T18:36:27Z)
Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects. Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.