HierRelTriple: Guiding Indoor Layout Generation with Hierarchical Relationship Triplet Losses
- URL: http://arxiv.org/abs/2503.20289v2
- Date: Tue, 16 Sep 2025 03:01:36 GMT
- Title: HierRelTriple: Guiding Indoor Layout Generation with Hierarchical Relationship Triplet Losses
- Authors: Kaifan Sun, Bingchen Yang, Peter Wonka, Jun Xiao, Haiyong Jiang,
- Abstract summary: We present a hierarchical triplet-based indoor relationship learning method, coined HierRelTriple, with a focus on spatial relationship learning.<n>We introduce HierRelTriple, a hierarchical relational triplets modeling framework that first partitions functional regions and then automatically extracts three levels of spatial relationships.<n>Experiments on unconditional layout synthesis, floorplan-conditioned layout generation, and scene rearrangement demonstrate that HierRel improves spatial-relation metrics by over 15%.
- Score: 52.70183252341687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a hierarchical triplet-based indoor relationship learning method, coined HierRelTriple, with a focus on spatial relationship learning. Existing approaches often depend on manually defined spatial rules or simplified pairwise representations, which fail to capture complex, multi-object relationships found in real scenarios and lead to overcrowded or physically implausible arrangements. We introduce HierRelTriple, a hierarchical relational triplets modeling framework that first partitions functional regions and then automatically extracts three levels of spatial relationships: object-to-region (O2R), object-to-object (O2O), and corner-to-corner (C2C). By representing these relationships as geometric triplets and employing approaches based on Delaunay Triangulation to establish spatial priors, we derive IoU loss between denoised and ground truth triplets and integrate them seamlessly into the diffusion denoising process. The introduction of the joint formulation of inter-object distances, angular orientations, and spatial relationships enhances the physical realism of the generated scenes. Extensive experiments on unconditional layout synthesis, floorplan-conditioned layout generation, and scene rearrangement demonstrate that HierRelTriple improves spatial-relation metrics by over 15% and substantially reduces collisions and boundary violations compared to state-of-the-art methods.
Related papers
- GFLAN: Generative Functional Layouts [1.1458853556386797]
GFLAN is a generative framework that restructures floor plan synthesis through explicit factorization into topological planning and geometric realization.<n>Our approach departs from direct pixel-to-pixel or wall-tracing generation in favor of a principled two-stage decomposition.
arXiv Detail & Related papers (2025-12-18T07:52:47Z) - SVRecon: Sparse Voxel Rasterization for Surface Reconstruction [60.92372415355283]
We extend the recently proposed sparse voxelization paradigm to the task of high-fidelity surface reconstruction by integrating SVRecon.<n>Our method achieves strong reconstruction accuracy while having consistently speedy convergence.
arXiv Detail & Related papers (2025-11-21T16:32:01Z) - Follow My Hold: Hand-Object Interaction Reconstruction through Geometric Guidance [61.41904916189093]
We propose a novel diffusion-based framework for reconstructing 3D geometry of hand-held objects from monocular RGB images.<n>We use hand-object interaction as geometric guidance to ensure plausible hand-object interactions.
arXiv Detail & Related papers (2025-08-25T17:11:53Z) - Preserving Topological and Geometric Embeddings for Point Cloud Recovery [43.26116605528137]
We propose an end-to-end architecture named textbfTopGeoFormer, which maintains these critical properties throughout the sampling and restoration phases.<n>In experiments, we comprehensively analyze the circumstances using the conventional and learning-based sampling/up/recovery algorithms.
arXiv Detail & Related papers (2025-07-25T09:58:41Z) - Learning to Align and Refine: A Foundation-to-Diffusion Framework for Occlusion-Robust Two-Hand Reconstruction [50.952228546326516]
Two-hand reconstruction from monocular images faces persistent challenges due to complex and dynamic hand postures.<n>Existing approaches struggle with such alignment issues, often resulting in misalignment and penetration artifacts.<n>We propose a dual-stage Foundation-to-Diffusion framework that precisely align 2D prior guidance from vision foundation models.
arXiv Detail & Related papers (2025-03-22T14:42:27Z) - Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting [7.962140902232628]
Spatial relation hallucinations pose a persistent challenge in large vision-language models (LVLMs)<n>We propose a constraint-aware prompting framework designed to reduce spatial relation hallucinations.
arXiv Detail & Related papers (2025-02-12T11:32:19Z) - "Set It Up!": Functional Object Arrangement with Compositional Generative Models [64.77941735876452]
We introduce a framework, SetItUp, for learning to interpret under-specified instructions.<n>We validate our framework on a dataset comprising study desks, dining tables, and coffee tables.
arXiv Detail & Related papers (2024-05-20T10:06:33Z) - RTF: Region-based Table Filling Method for Relational Triple Extraction [17.267920424291372]
We propose a novel Region-based Table Filling method (RT) for extracting triples from knowledge graphs.
We devise a novel regionbased tagging scheme and bi-directional decoding strategy, which regard each triple as a region on the relation-specific table, and identifies triples by determining two endpoints of each region.
Experimental results show our method achieves better generalization capability on three variants of two widely used benchmark datasets.
arXiv Detail & Related papers (2024-04-29T23:36:38Z) - Serving Deep Learning Model in Relational Databases [70.53282490832189]
Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains.
We highlight three pivotal paradigms: The state-of-the-art DL-centric architecture offloads DL computations to dedicated DL frameworks.
The potential UDF-centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the relational database management system (RDBMS)
arXiv Detail & Related papers (2023-10-07T06:01:35Z) - LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [107.11267074981905]
We propose a semantically controllable layout-AWare diffusion model, termed LAW-Diffusion.
We show that LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.
arXiv Detail & Related papers (2023-08-13T08:06:18Z) - RE$^2$: Region-Aware Relation Extraction from Visually Rich Documents [18.369611871952667]
We propose REgion-Aware Relation Extraction (RE$2$) that leverages region-level spatial structure among the entity blocks to improve their relation prediction.
We also introduce a constraint objective to regularize the model towards consistency with the inherent constraints of the relation extraction task.
arXiv Detail & Related papers (2023-05-24T00:07:40Z) - Learning Relation-Specific Representations for Few-shot Knowledge Graph
Completion [24.880078645503417]
We propose a Relation-Specific Context Learning framework, which exploits graph contexts of triples to capture semantic information of relations and entities simultaneously.
Experimental results on two public datasets demonstrate that RSCL outperforms state-of-the-art FKGC methods.
arXiv Detail & Related papers (2022-03-22T11:45:48Z) - Few Shot Generative Model Adaption via Relaxed Spatial Structural
Alignment [130.84010267004803]
Training a generative adversarial network (GAN) with limited data has been a challenging task.
A feasible solution is to start with a GAN well-trained on a large scale source domain and adapt it to the target domain with a few samples, termed as few shot generative model adaption.
We propose a relaxed spatial structural alignment method to calibrate the target generative models during the adaption.
arXiv Detail & Related papers (2022-03-06T14:26:25Z) - Learning to Compose Visual Relations [100.45138490076866]
We propose to represent each relation as an unnormalized density (an energy-based model)
We show that such a factorized decomposition allows the model to both generate and edit scenes with multiple sets of relations more faithfully.
arXiv Detail & Related papers (2021-11-17T18:51:29Z) - Joint Constrained Learning for Event-Event Relation Extraction [94.3499255880101]
We propose a joint constrained learning framework for modeling event-event relations.
Specifically, the framework enforces logical constraints within and across multiple temporal and subevent relations.
We show that our joint constrained learning approach effectively compensates for the lack of jointly labeled data.
arXiv Detail & Related papers (2020-10-13T22:45:28Z) - Intrinsic Relationship Reasoning for Small Object Detection [44.68289739449486]
Small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other.
We propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects.
arXiv Detail & Related papers (2020-09-02T06:03:05Z) - DensE: An Enhanced Non-commutative Representation for Knowledge Graph
Embedding with Adaptive Semantic Hierarchy [4.607120217372668]
We develop a novel knowledge graph embedding method, named DensE, to provide an improved modeling scheme for the complex composition patterns of relations.
Our method decomposes each relation into an SO(3) group-based rotation operator and a scaling operator in the three dimensional (3-D) Euclidean space.
Experimental results on multiple benchmark knowledge graphs show that DensE outperforms the current state-of-the-art models for missing link prediction.
arXiv Detail & Related papers (2020-08-11T06:45:50Z) - Understanding Spatial Relations through Multiple Modalities [78.07328342973611]
spatial relations between objects can either be explicit -- expressed as spatial prepositions, or implicit -- expressed by spatial verbs such as moving, walking, shifting, etc.
We introduce the task of inferring implicit and explicit spatial relations between two entities in an image.
We design a model that uses both textual and visual information to predict the spatial relations, making use of both positional and size information of objects and image embeddings.
arXiv Detail & Related papers (2020-07-19T01:35:08Z) - On Embeddings in Relational Databases [11.52782249184251]
We address the problem of learning a distributed representation of entities in a relational database using a low-dimensional embedding.
Recent methods for learning embedding constitute of a naive approach to consider complete denormalization of the database by relationalizing the full join of all tables and representing as a knowledge graph.
In this paper we demonstrate; a better methodology for learning representations by exploiting the underlying semantics of columns in a table while using the relation joins and the latent inter-row relationships.
arXiv Detail & Related papers (2020-05-13T17:21:27Z) - Local Propagation in Constraint-based Neural Network [77.37829055999238]
We study a constraint-based representation of neural network architectures.
We investigate a simple optimization procedure that is well suited to fulfil the so-called architectural constraints.
arXiv Detail & Related papers (2020-02-18T16:47:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.