ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps
- URL: http://arxiv.org/abs/2504.09608v1
- Date: Sun, 13 Apr 2025 14:56:41 GMT
- Title: ERL-MPP: Evolutionary Reinforcement Learning with Multi-head Puzzle Perception for Solving Large-scale Jigsaw Puzzles of Eroded Gaps
- Authors: Xingke Song, Xiaoying Yang, Chenglin Yao, Jianfeng Ren, Ruibin Bai, Xin Chen, Xudong Jiang,
- Abstract summary: We propose a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception.<n>The proposed ERL-MPP is evaluated on the JPLEG-5 dataset with large gaps and the MIT dataset with large-scale puzzles.<n>It significantly outperforms all state-of-the-art models on both datasets.
- Score: 28.009783235854584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Solving jigsaw puzzles has been extensively studied. While most existing models focus on solving either small-scale puzzles or puzzles with no gap between fragments, solving large-scale puzzles with gaps presents distinctive challenges in both image understanding and combinatorial optimization. To tackle these challenges, we propose a framework of Evolutionary Reinforcement Learning with Multi-head Puzzle Perception (ERL-MPP) to derive a better set of swapping actions for solving the puzzles. Specifically, to tackle the challenges of perceiving the puzzle with gaps, a Multi-head Puzzle Perception Network (MPPN) with a shared encoder is designed, where multiple puzzlet heads comprehensively perceive the local assembly status, and a discriminator head provides a global assessment of the puzzle. To explore the large swapping action space efficiently, an Evolutionary Reinforcement Learning (EvoRL) agent is designed, where an actor recommends a set of suitable swapping actions from a large action space based on the perceived puzzle status, a critic updates the actor using the estimated rewards and the puzzle status, and an evaluator coupled with evolutionary strategies evolves the actions aligning with the historical assembly experience. The proposed ERL-MPP is comprehensively evaluated on the JPLEG-5 dataset with large gaps and the MIT dataset with large-scale puzzles. It significantly outperforms all state-of-the-art models on both datasets.
Related papers
- PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving [50.50405233978406]
We propose a fully dynamic multimodal evaluation framework, named Open-ended Visual Puzzle Generation (OVPG)
OVPG aims to generate fresh, diverse, and verifiable evaluation data automatically in puzzle-solving tasks.
Built upon OVPG, we construct PuzzleBench, a dynamic and scalable benchmark comprising 11,840 VQA samples.
arXiv Detail & Related papers (2025-04-15T05:29:31Z) - GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs [15.118234858274679]
We propose Generative Visual Puzzles (GenVP) to model the entire RPM generation process.<n>Our model's capability spans from generating multiple solutions for one specific problem prompt to creating complete new puzzles out of the desired set of rules.
arXiv Detail & Related papers (2025-03-30T21:35:26Z) - CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation [53.452699232071495]
CrossWordBench is a benchmark designed to evaluate the reasoning capabilities of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) through the medium of crossword puzzles.<n>Our evaluation reveals that reasoning LLMs outperform non-reasoning models substantially by effectively leveraging crossing-letter constraints.<n>Our findings offer insights into the limitations of the reasoning capabilities of current LLMs and LVLMs, and provide an effective approach for creating multimodal constrained tasks for future evaluations.
arXiv Detail & Related papers (2025-03-30T20:03:36Z) - VGRP-Bench: Visual Grid Reasoning Puzzle Benchmark for Large Vision-Language Models [31.645103181716678]
Large Vision-Language Models (LVLMs) struggle with puzzles, which require precise perception, rule comprehension, and logical reasoning.<n>We introduce VGRP-Bench, a Visual Grid Reasoning Puzzle Benchmark featuring 20 diverse puzzles.<n>Our results reveal that even the state-of-the-art LVLMs struggle with these puzzles, highlighting fundamental limitations in their puzzle-solving capabilities.
arXiv Detail & Related papers (2025-03-29T12:50:38Z) - A Generic Hybrid Framework for 2D Visual Reconstruction [39.58317527488534]
This paper presents a versatile hybrid framework for addressing 2D real-world reconstruction tasks formulated as jigsaw puzzle problems (JPPs) with square, non-overlapping pieces.
Our approach integrates a deep learning (DL)-based compatibility measure (CM) model that evaluates pairs of puzzle pieces holistically.
Our unique hybrid methodology achieves state-of-the-art (SOTA) results in reconstructing Portuguese tile panels and large degraded puzzles with eroded boundaries.
arXiv Detail & Related papers (2025-01-31T17:21:29Z) - Solving Zebra Puzzles Using Constraint-Guided Multi-Agent Systems [25.0042181817455]
We introduce a multi-agent system, ZPS, that integrates Large Language Models with an off the shelf theorem prover.
This system tackles the complex puzzle-solving task by breaking down the problem into smaller, manageable parts.
We also introduce an automated grid puzzle grader to assess the correctness of our puzzle solutions and show that the automated grader is reliable by evaluating it in a user-study.
arXiv Detail & Related papers (2024-07-04T14:22:25Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Multi-Phase Relaxation Labeling for Square Jigsaw Puzzle Solving [73.58829980121767]
We present a novel method for solving square jigsaw puzzles based on global optimization.
The method is fully automatic, assumes no prior information, and can handle puzzles with known or unknown piece orientation.
arXiv Detail & Related papers (2023-03-26T18:53:51Z) - Automated Graph Genetic Algorithm based Puzzle Validation for Faster
Game Desig [69.02688684221265]
This paper presents an evolutionary algorithm, empowered by expert-knowledge informeds, for solving logical puzzles in video games efficiently.
We discuss multiple variations of hybrid genetic approaches for constraint satisfaction problems that allow us to find a diverse set of near-optimal solutions for puzzles.
arXiv Detail & Related papers (2023-02-17T18:15:33Z) - Video Anomaly Detection by Solving Decoupled Spatio-Temporal Jigsaw
Puzzles [67.39567701983357]
Video Anomaly Detection (VAD) is an important topic in computer vision.
Motivated by the recent advances in self-supervised learning, this paper addresses VAD by solving an intuitive yet challenging pretext task.
Our method outperforms state-of-the-art counterparts on three public benchmarks.
arXiv Detail & Related papers (2022-07-20T19:49:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.