GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a
generative mental image
- URL: http://arxiv.org/abs/2207.05634v1
- Date: Tue, 12 Jul 2022 16:02:00 GMT
- Title: GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a
generative mental image
- Authors: Davide Talon, Alessio Del Bue, Stuart James
- Abstract summary: We infer a mental image from all pieces, which a given piece can then be matched against avoiding the explosion.
We learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint embedding space to match an encoding of each piece to the cropped layer of the generator.
In doing so our model is puzzle size agnostic, in contrast to prior deep learning methods which are single size.
- Score: 15.132848477903314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Puzzle solving is a combinatorial challenge due to the difficulty of matching
adjacent pieces. Instead, we infer a mental image from all pieces, which a
given piece can then be matched against avoiding the combinatorial explosion.
Exploiting advancements in Generative Adversarial methods, we learn how to
reconstruct the image given a set of unordered pieces, allowing the model to
learn a joint embedding space to match an encoding of each piece to the cropped
layer of the generator. Therefore we frame the problem as a R@1 retrieval task,
and then solve the linear assignment using differentiable Hungarian attention,
making the process end-to-end. In doing so our model is puzzle size agnostic,
in contrast to prior deep learning methods which are single size. We evaluate
on two new large-scale datasets, where our model is on par with deep learning
methods, while generalizing to multiple puzzle sizes.
Related papers
- Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers [5.374411622670979]
Image and video jigsaw puzzles pose the challenging task of rearranging image fragments or video frames from unordered sequences to restore meaningful images and video sequences.
Existing approaches often hinge on discriminative models tasked with predicting either the absolute positions of puzzle elements or the permutation actions applied to the original data.
We propose JPDVT, an innovative approach that harnesses diffusion transformers to address this challenge.
arXiv Detail & Related papers (2024-04-10T18:40:23Z) - DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly [21.497180110855975]
We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks.
Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph.
We highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving.
arXiv Detail & Related papers (2024-02-29T16:09:12Z) - Multi-Phase Relaxation Labeling for Square Jigsaw Puzzle Solving [73.58829980121767]
We present a novel method for solving square jigsaw puzzles based on global optimization.
The method is fully automatic, assumes no prior information, and can handle puzzles with known or unknown piece orientation.
arXiv Detail & Related papers (2023-03-26T18:53:51Z) - PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial
Puzzle Solving [17.781484376483707]
The paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving.
A surprising discovery is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process.
To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements.
arXiv Detail & Related papers (2022-11-24T20:06:11Z) - A Generalist Framework for Panoptic Segmentation of Images and Videos [61.61453194912186]
We formulate panoptic segmentation as a discrete data generation problem, without relying on inductive bias of the task.
A diffusion model is proposed to model panoptic masks, with a simple architecture and generic loss function.
Our method is capable of modeling video (in a streaming setting) and thereby learns to track object instances automatically.
arXiv Detail & Related papers (2022-10-12T16:18:25Z) - Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw
Puzzles [67.39567701983357]
Video Anomaly Detection (VAD) is an important topic in computer vision.
Motivated by the recent advances in self-supervised learning, this paper addresses VAD by solving an intuitive yet challenging pretext task.
Our method outperforms state-of-the-art counterparts on three public benchmarks.
arXiv Detail & Related papers (2022-07-20T19:49:32Z) - Brick-by-Brick: Combinatorial Construction with Deep Reinforcement
Learning [52.85981207514049]
We introduce a novel formulation, complex construction, which requires a building agent to assemble unit primitives sequentially.
To construct a target object, we provide incomplete knowledge about the desired target (i.e., 2D images) instead of exact and explicit information to the agent.
We demonstrate that the proposed method successfully learns to construct an unseen object conditioned on a single image or multiple views of a target object.
arXiv Detail & Related papers (2021-10-29T01:09:51Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z) - JigsawGAN: Self-supervised Learning for Solving Jigsaw Puzzles with
Generative Adversarial Networks [31.190344964881625]
The paper proposes a solution based on Generative Adversarial Network (GAN) for solving jigsaw puzzles.
The proposed method can solve jigsaw puzzles more efficiently by utilizing both semantic information and edge information simultaneously.
arXiv Detail & Related papers (2021-01-19T10:40:38Z) - Non-Rigid Puzzles [50.213265511586535]
We present a non-rigid multi-part shape matching algorithm.
We assume to be given a reference shape and its multiple parts undergoing a non-rigid deformation.
Experimental results on synthetic as well as real scans demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2020-11-26T00:32:30Z) - Pictorial and apictorial polygonal jigsaw puzzles: The lazy caterer
model, properties, and solvers [14.08706290287121]
We formalize a new type of jigsaw puzzle where the pieces are general convex polygons generated by cutting through a global polygonal shape/image with an arbitrary number of straight cuts.
We analyze the theoretical properties of such puzzles, including the inherent challenges in solving them once pieces are contaminated with geometrical noise.
arXiv Detail & Related papers (2020-08-17T22:07:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.