MangaDiT: Reference-Guided Line Art Colorization with Hierarchical Attention in Diffusion Transformers
- URL: http://arxiv.org/abs/2508.09709v1
- Date: Wed, 13 Aug 2025 11:02:11 GMT
- Title: MangaDiT: Reference-Guided Line Art Colorization with Hierarchical Attention in Diffusion Transformers
- Authors: Qianru Qiu, Jiafeng Mao, Kento Masui, Xueting Wang,
- Abstract summary: We present MangaDiT, a powerful model for reference-guided line art colorization based on Diffusion Transformers (DiT)<n>Our model takes both line art and reference images as conditional inputs and introduces a hierarchical attention mechanism with a dynamic attention weighting strategy.<n> Experiments on two benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches.
- Score: 5.312303275762103
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in diffusion models have significantly improved the performance of reference-guided line art colorization. However, existing methods still struggle with region-level color consistency, especially when the reference and target images differ in character pose or motion. Instead of relying on external matching annotations between the reference and target, we propose to discover semantic correspondences implicitly through internal attention mechanisms. In this paper, we present MangaDiT, a powerful model for reference-guided line art colorization based on Diffusion Transformers (DiT). Our model takes both line art and reference images as conditional inputs and introduces a hierarchical attention mechanism with a dynamic attention weighting strategy. This mechanism augments the vanilla attention with an additional context-aware path that leverages pooled spatial features, effectively expanding the model's receptive field and enhancing region-level color alignment. Experiments on two benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches, achieving superior performance in both qualitative and quantitative evaluations.
Related papers
- Neural Scene Designer: Self-Styled Semantic Image Manipulation [67.43125248646653]
We introduce the Neural Scene Designer (NSD), a novel framework that enables photo-realistic manipulation of user-specified scene regions.<n>NSD ensures both semantic alignment with user intent and stylistic consistency with the surrounding environment.<n>To capture fine-grained style representations, we propose the Progressive Self-style Representational Learning (PSRL) module.
arXiv Detail & Related papers (2025-09-01T11:59:03Z) - ColorizeDiffusion v2: Enhancing Reference-based Sketch Colorization Through Separating Utilities [28.160601838418433]
Reference-based sketch colorization methods have garnered significant attention due to their potential applications in the animation production industry.<n>Most existing methods are trained with image triplets of sketch, reference, and ground truth that are semantically and spatially well-aligned.<n>This mismatch in data distribution between training and inference leads to overfitting, resulting in spatial artifacts and significant degradation in overall colorization quality.
arXiv Detail & Related papers (2025-04-09T13:55:32Z) - Leveraging Semantic Attribute Binding for Free-Lunch Color Control in Diffusion Models [53.73253164099701]
We introduce ColorWave, a training-free approach that achieves exact RGB-level color control in diffusion models without fine-tuning.<n>We demonstrate that ColorWave establishes a new paradigm for structured, color-consistent diffusion-based image synthesis.
arXiv Detail & Related papers (2025-03-12T21:49:52Z) - MangaNinja: Line Art Colorization with Precise Reference Following [84.2001766692797]
MangaNinjia specializes in the task of reference-guided line art colorization.<n>We incorporate two thoughtful designs to ensure precise character detail transcription.<n>A patch shuffling module to facilitate correspondence learning between the reference color image and the target line art, and a point-driven control scheme to enable fine-grained color matching.
arXiv Detail & Related papers (2025-01-14T18:59:55Z) - Consistent Human Image and Video Generation with Spatially Conditioned Diffusion [82.4097906779699]
Consistent human-centric image and video synthesis aims to generate images with new poses while preserving appearance consistency with a given reference image.<n>We frame the task as a spatially-conditioned inpainting problem, where the target image is in-painted to maintain appearance consistency with the reference.<n>This approach enables the reference features to guide the generation of pose-compliant targets within a unified denoising network.
arXiv Detail & Related papers (2024-12-19T05:02:30Z) - Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters.
We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - PUGAN: Physical Model-Guided Underwater Image Enhancement Using GAN with
Dual-Discriminators [120.06891448820447]
How to obtain clear and visually pleasant images has become a common concern of people.
The task of underwater image enhancement (UIE) has also emerged as the times require.
In this paper, we propose a physical model-guided GAN model for UIE, referred to as PUGAN.
Our PUGAN outperforms state-of-the-art methods in both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-06-15T07:41:12Z) - Attention-Aware Anime Line Drawing Colorization [10.924683447616273]
We introduce an attention-based model for anime line drawing colorization, in which a channel-wise and spatial-wise Convolutional Attention module is used.
Our method outperforms other SOTA methods, with more accurate line structure and semantic color information.
arXiv Detail & Related papers (2022-12-21T12:50:31Z) - Eliminating Gradient Conflict in Reference-based Line-art Colorization [26.46476996150605]
Reference-based line-art colorization is a challenging task in computer vision.
We propose a novel attention mechanism using Stop-Gradient Attention (SGA)
Compared with state-of-the-art modules in line-art colorization, our approach demonstrates significant improvements.
arXiv Detail & Related papers (2022-07-13T10:08:37Z) - Attention-based Stylisation for Exemplar Image Colourisation [3.491870689686827]
This work reformulates the existing methodology introducing a novel end-to-end colourisation network.
The proposed architecture integrates attention modules at different resolutions that learn how to perform the style transfer task.
Experimental validations demonstrate efficiency of the proposed methodology which generates high quality and visual appealing colourisation.
arXiv Detail & Related papers (2021-05-04T18:56:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.