Region-Wise Correspondence Prediction between Manga Line Art Images
- URL: http://arxiv.org/abs/2509.09501v2
- Date: Fri, 12 Sep 2025 05:32:10 GMT
- Title: Region-Wise Correspondence Prediction between Manga Line Art Images
- Authors: Yingxuan Li, Jiafeng Mao, Qianru Qiu, Yusuke Matsui,
- Abstract summary: This paper introduces a novel and practical task: predicting region-wise correspondence between raw manga line art images.<n>We propose a Transformer-based framework that learns patch-level similarities within and across images.<n>We then apply edge-aware clustering and a region matching algorithm to convert patch-level predictions into coherent region-level correspondences.
- Score: 19.50212867795051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding region-wise correspondence between manga line art images is a fundamental task in manga processing, enabling downstream applications such as automatic line art colorization and in-between frame generation. However, this task remains largely unexplored, especially in realistic scenarios without pre-existing segmentation or annotations. In this paper, we introduce a novel and practical task: predicting region-wise correspondence between raw manga line art images without any pre-existing labels or masks. To tackle this problem, we divide each line art image into a set of patches and propose a Transformer-based framework that learns patch-level similarities within and across images. We then apply edge-aware clustering and a region matching algorithm to convert patch-level predictions into coherent region-level correspondences. To support training and evaluation, we develop an automatic annotation pipeline and manually refine a subset of the data to construct benchmark datasets. Experiments on multiple datasets demonstrate that our method achieves high patch-level accuracy (e.g., 96.34%) and generates consistent region-level correspondences, highlighting its potential for real-world manga applications.
Related papers
- Dense Feature Interaction Network for Image Inpainting Localization [28.028361409524457]
Inpainting, the process of filling in missing areas in an image, is a common image editing technique.<n>In this paper, we describe a new method for inpainting detection based on a Dense Feature Interaction Network (DeFI-Net)<n>DeFI-Net uses a novel feature pyramid architecture to capture and amplify multi-scale representations across various stages.
arXiv Detail & Related papers (2024-08-05T02:35:13Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.<n>VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.<n>Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - Locate, Assign, Refine: Taming Customized Promptable Image Inpainting [22.163855501668206]
We introduce the multimodal promptable image inpainting project: a new task model, and data for taming customized image inpainting.<n>We propose LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of specific region in images corresponding to the mask prompt.<n>Our LAR-Gen adopts a coarse-to-fine manner to ensure the context consistency of source image, subject identity consistency, local semantic consistency to the text description, and smoothness consistency.
arXiv Detail & Related papers (2024-03-28T16:07:55Z) - Zero-shot spatial layout conditioning for text-to-image diffusion models [52.24744018240424]
Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling.
We consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content.
We propose ZestGuide, a zero-shot segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models.
arXiv Detail & Related papers (2023-06-23T19:24:48Z) - Image Matching by Bare Homography [9.431261418370147]
This paper presents Slime, a novel non-deep image matching framework which models the scene as rough local overlapping planes.
Planes are mutually extended by compatible matches and the images are split into fixed tiles, with only the best homographies retained for each pair of tiles.
The paper gives a thorough comparative analysis of recent state-of-the-art in image matching represented by end-to-end deep networks and hybrid pipelines.
arXiv Detail & Related papers (2023-05-15T18:35:47Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels.
The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level.
We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z) - Smooth image-to-image translations with latent space interpolations [64.8170758294427]
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain.
We show that our regularization techniques can improve the state-of-the-art I2I translations by a large margin.
arXiv Detail & Related papers (2022-10-03T11:57:30Z) - Bridging the Visual Gap: Wide-Range Image Blending [16.464837892640812]
We introduce an effective deep-learning model to realize wide-range image blending.
We experimentally demonstrate that our proposed method is able to produce visually appealing results.
arXiv Detail & Related papers (2021-03-28T15:07:45Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - Semantically Adaptive Image-to-image Translation for Domain Adaptation
of Semantic Segmentation [1.8275108630751844]
We address the problem of domain adaptation for semantic segmentation of street scenes.
Many state-of-the-art approaches focus on translating the source image while imposing that the result should be semantically consistent with the input.
We advocate that the image semantics can also be exploited to guide the translation algorithm.
arXiv Detail & Related papers (2020-09-02T16:16:50Z) - Guidance and Evaluation: Semantic-Aware Image Inpainting for Mixed
Scenes [54.836331922449666]
We propose a Semantic Guidance and Evaluation Network (SGE-Net) to update the structural priors and the inpainted image.
It utilizes semantic segmentation map as guidance in each scale of inpainting, under which location-dependent inferences are re-evaluated.
Experiments on real-world images of mixed scenes demonstrated the superiority of our proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2020-03-15T17:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.