PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial
Puzzle Solving
- URL: http://arxiv.org/abs/2211.13785v3
- Date: Tue, 3 Oct 2023 22:29:43 GMT
- Title: PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial
Puzzle Solving
- Authors: Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust,
Yasutaka Furukawa
- Abstract summary: The paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving.
A surprising discovery is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process.
To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements.
- Score: 17.781484376483707
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents an end-to-end neural architecture based on Diffusion
Models for spatial puzzle solving, particularly jigsaw puzzle and room
arrangement tasks. In the latter task, for instance, the proposed system
"PuzzleFusion" takes a set of room layouts as polygonal curves in the top-down
view and aligns the room layout pieces by estimating their 2D translations and
rotations, akin to solving the jigsaw puzzle of room layouts. A surprising
discovery of the paper is that the simple use of a Diffusion Model effectively
solves these challenging spatial puzzle tasks as a conditional generation
process. To enable learning of an end-to-end neural system, the paper
introduces new datasets with ground-truth arrangements: 1) 2D Voronoi jigsaw
dataset, a synthetic one where pieces are generated by Voronoi diagram of 2D
pointset; and 2) MagicPlan dataset, a real one offered by MagicPlan from its
production pipeline, where pieces are room layouts constructed by augmented
reality App by real-estate consumers. The qualitative and quantitative
evaluations demonstrate that our approach outperforms the competing methods by
significant margins in all the tasks.
Related papers
- Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly [21.497180110855975]
We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks.
Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph.
We highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving.
arXiv Detail & Related papers (2024-02-29T16:09:12Z) - Compositional Generative Inverse Design [69.22782875567547]
Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem.
We show that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples.
In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes.
arXiv Detail & Related papers (2024-01-24T01:33:39Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Positional Diffusion: Ordering Unordered Sets with Diffusion
Probabilistic Models [32.63654140960086]
We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models.
We use the forward process to map elements' positions in a set to random positions in a continuous space.
Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network.
arXiv Detail & Related papers (2023-03-20T14:01:01Z) - Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries [27.564355569013706]
We develop a novel Transformer architecture that generates polygons of multiple rooms in parallel.
Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD.
It can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows.
arXiv Detail & Related papers (2022-11-28T18:59:09Z) - Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw
Puzzles [67.39567701983357]
Video Anomaly Detection (VAD) is an important topic in computer vision.
Motivated by the recent advances in self-supervised learning, this paper addresses VAD by solving an intuitive yet challenging pretext task.
Our method outperforms state-of-the-art counterparts on three public benchmarks.
arXiv Detail & Related papers (2022-07-20T19:49:32Z) - GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a
generative mental image [15.132848477903314]
We infer a mental image from all pieces, which a given piece can then be matched against avoiding the explosion.
We learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint embedding space to match an encoding of each piece to the cropped layer of the generator.
In doing so our model is puzzle size agnostic, in contrast to prior deep learning methods which are single size.
arXiv Detail & Related papers (2022-07-12T16:02:00Z) - MCTS with Refinement for Proposals Selection Games in Scene
Understanding [32.92475660892122]
We propose a novel method applicable in many scene understanding problems that adapts the Monte Carlo Tree Search (MCTS) algorithm.
From a generated pool of proposals, our method jointly selects and optimize proposals that maximize the objective term.
Our method shows high performance on the Matterport3D dataset without introducing hard constraints on room layout configurations.
arXiv Detail & Related papers (2022-07-07T10:15:54Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Convolutional Occupancy Networks [88.48287716452002]
We propose Convolutional Occupancy Networks, a more flexible implicit representation for detailed reconstruction of objects and 3D scenes.
By combining convolutional encoders with implicit occupancy decoders, our model incorporates inductive biases, enabling structured reasoning in 3D space.
We empirically find that our method enables the fine-grained implicit 3D reconstruction of single objects, scales to large indoor scenes, and generalizes well from synthetic to real data.
arXiv Detail & Related papers (2020-03-10T10:17:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.