A Generic Hybrid Framework for 2D Visual Reconstruction
- URL: http://arxiv.org/abs/2501.19325v1
- Date: Fri, 31 Jan 2025 17:21:29 GMT
- Title: A Generic Hybrid Framework for 2D Visual Reconstruction
- Authors: Daniel Rika, Dror Sholomon, Eli David, Alexandre Pais, Nathan S. Netanyahu,
- Abstract summary: This paper presents a versatile hybrid framework for addressing 2D real-world reconstruction tasks formulated as jigsaw puzzle problems (JPPs) with square, non-overlapping pieces.
Our approach integrates a deep learning (DL)-based compatibility measure (CM) model that evaluates pairs of puzzle pieces holistically.
Our unique hybrid methodology achieves state-of-the-art (SOTA) results in reconstructing Portuguese tile panels and large degraded puzzles with eroded boundaries.
- Score: 39.58317527488534
- License:
- Abstract: This paper presents a versatile hybrid framework for addressing 2D real-world reconstruction tasks formulated as jigsaw puzzle problems (JPPs) with square, non-overlapping pieces. Our approach integrates a deep learning (DL)-based compatibility measure (CM) model that evaluates pairs of puzzle pieces holistically, rather than focusing solely on their adjacent edges as traditionally done. This DL-based CM is paired with an optimized genetic algorithm (GA)-based solver, which iteratively searches for a global optimal arrangement using the pairwise CM scores of the puzzle pieces. Extensive experimental results highlight the framework's adaptability and robustness across multiple real-world domains. Notably, our unique hybrid methodology achieves state-of-the-art (SOTA) results in reconstructing Portuguese tile panels and large degraded puzzles with eroded boundaries.
Related papers
- LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - 3D Geometric Shape Assembly via Efficient Point Cloud Matching [59.241448711254485]
We introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts.
Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task.
We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad.
arXiv Detail & Related papers (2024-07-15T08:50:02Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Multi-Phase Relaxation Labeling for Square Jigsaw Puzzle Solving [73.58829980121767]
We present a novel method for solving square jigsaw puzzles based on global optimization.
The method is fully automatic, assumes no prior information, and can handle puzzles with known or unknown piece orientation.
arXiv Detail & Related papers (2023-03-26T18:53:51Z) - PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial
Puzzle Solving [17.781484376483707]
The paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving.
A surprising discovery is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process.
To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements.
arXiv Detail & Related papers (2022-11-24T20:06:11Z) - GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a
generative mental image [15.132848477903314]
We infer a mental image from all pieces, which a given piece can then be matched against avoiding the explosion.
We learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint embedding space to match an encoding of each piece to the cropped layer of the generator.
In doing so our model is puzzle size agnostic, in contrast to prior deep learning methods which are single size.
arXiv Detail & Related papers (2022-07-12T16:02:00Z) - TEN: Twin Embedding Networks for the Jigsaw Puzzle Problem with Eroded
Boundaries [0.0]
The jigsaw puzzle problem (JPP) is a well-known research problem, which has been studied for many years.
Many effective CMs, which apply a simple distance measure, based merely on the information along the piece edges, have been proposed.
However, the practicality of these classical methods is rather doubtful for problem instances harder than pure synthetic images.
To overcome this significant deficiency, a few deep convolutional neural network (CNN)-based CMs have been recently introduced.
The paper makes a significant first attempt at bridging the gap between the relatively low accuracy (of classical methods) and the intensive computational complexity (
arXiv Detail & Related papers (2022-03-12T17:18:47Z) - Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
Improved Cross-Modal Retrieval [80.35589927511667]
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
We propose a novel fine-tuning framework which turns any pretrained text-image multi-modal model into an efficient retrieval model.
Our experiments on a series of standard cross-modal retrieval benchmarks in monolingual, multilingual, and zero-shot setups, demonstrate improved accuracy and huge efficiency benefits over the state-of-the-art cross-encoders.
arXiv Detail & Related papers (2021-03-22T15:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.