Related papers: PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving

PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving

URL: http://arxiv.org/abs/2211.13785v3
Date: Tue, 3 Oct 2023 22:29:43 GMT
Title: PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving
Authors: Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust, Yasutaka Furukawa
Abstract summary: The paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving. A surprising discovery is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process. To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements.
Score: 17.781484376483707
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving, particularly jigsaw puzzle and room arrangement tasks. In the latter task, for instance, the proposed system "PuzzleFusion" takes a set of room layouts as polygonal curves in the top-down view and aligns the room layout pieces by estimating their 2D translations and rotations, akin to solving the jigsaw puzzle of room layouts. A surprising discovery of the paper is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process. To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements: 1) 2D Voronoi jigsaw dataset, a synthetic one where pieces are generated by Voronoi diagram of 2D pointset; and 2) MagicPlan dataset, a real one offered by MagicPlan from its production pipeline, where pieces are room layouts constructed by augmented reality App by real-estate consumers. The qualitative and quantitative evaluations demonstrate that our approach outperforms the competing methods by significant margins in all the tasks.

Related papers

ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps [28.009783235854584]
We propose a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception. The proposed ERL-MPP is evaluated on the JPLEG-5 dataset with large gaps and the MIT dataset with large-scale puzzles. It significantly outperforms all state-of-the-art models on both datasets.
arXiv Detail & Related papers (2025-04-13T14:56:41Z)
A Generic Hybrid Framework for 2D Visual Reconstruction [39.58317527488534]
This paper presents a versatile hybrid framework for addressing 2D real-world reconstruction tasks formulated as jigsaw puzzle problems (JPPs) with square, non-overlapping pieces. Our approach integrates a deep learning (DL)-based compatibility measure (CM) model that evaluates pairs of puzzle pieces holistically. Our unique hybrid methodology achieves state-of-the-art (SOTA) results in reconstructing Portuguese tile panels and large degraded puzzles with eroded boundaries.
arXiv Detail & Related papers (2025-01-31T17:21:29Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture. We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation. Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly [21.497180110855975]
We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks. Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph. We highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving.
arXiv Detail & Related papers (2024-02-29T16:09:12Z)
Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem. We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z)
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results. We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z)
Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models [32.63654140960086]
We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models. We use the forward process to map elements' positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network.
arXiv Detail & Related papers (2023-03-20T14:01:01Z)
Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries [27.564355569013706]
We develop a novel Transformer architecture that generates polygons of multiple rooms in parallel. Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD. It can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows.
arXiv Detail & Related papers (2022-11-28T18:59:09Z)
Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw Puzzles [67.39567701983357]
Video Anomaly Detection (VAD) is an important topic in computer vision. Motivated by the recent advances in self-supervised learning, this paper addresses VAD by solving an intuitive yet challenging pretext task. Our method outperforms state-of-the-art counterparts on three public benchmarks.
arXiv Detail & Related papers (2022-07-20T19:49:32Z)
GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a generative mental image [15.132848477903314]
We infer a mental image from all pieces, which a given piece can then be matched against avoiding the explosion. We learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint embedding space to match an encoding of each piece to the cropped layer of the generator. In doing so our model is puzzle size agnostic, in contrast to prior deep learning methods which are single size.
arXiv Detail & Related papers (2022-07-12T16:02:00Z)
MCTS with Refinement for Proposals Selection Games in Scene Understanding [32.92475660892122]
We propose a novel method applicable in many scene understanding problems that adapts the Monte Carlo Tree Search (MCTS) algorithm. From a generated pool of proposals, our method jointly selects and optimize proposals that maximize the objective term. Our method shows high performance on the Matterport3D dataset without introducing hard constraints on room layout configurations.
arXiv Detail & Related papers (2022-07-07T10:15:54Z)
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution. A natural remedy is to utilize the 3D voxelization and 3D convolution network. We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z)
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image [102.44347847154867]
We propose a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.
arXiv Detail & Related papers (2020-04-02T17:58:05Z)
Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes. By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space. We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.