DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
- URL: http://arxiv.org/abs/2509.14685v2
- Date: Wed, 01 Oct 2025 06:59:16 GMT
- Title: DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
- Authors: Kazuma Nagata, Naoshi Kaneko,
- Abstract summary: DACoN is a framework that leverages foundation models to capture part-level semantics, even in line drawings.<n>Our method fuses low-resolution semantic features from foundation models with high-resolution spatial features from CNNs for fine-grained yet robust feature extraction.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic colorization of line drawings has been widely studied to reduce the labor cost of hand-drawn anime production. Deep learning approaches, including image/video generation and feature-based correspondence, have improved accuracy but struggle with occlusions, pose variations, and viewpoint changes. To address these challenges, we propose DACoN, a framework that leverages foundation models to capture part-level semantics, even in line drawings. Our method fuses low-resolution semantic features from foundation models with high-resolution spatial features from CNNs for fine-grained yet robust feature extraction. In contrast to previous methods that rely on the Multiplex Transformer and support only one or two reference images, DACoN removes this constraint, allowing any number of references. Quantitative and qualitative evaluations demonstrate the benefits of using multiple reference images, achieving superior colorization performance. Our code and model are available at https://github.com/kzmngt/DACoN.
Related papers
- Image Referenced Sketch Colorization Based on Animation Creation Workflow [28.281739343084993]
We propose a diffusion-based framework inspired by real-world animation production.<n>Our approach leverages the sketch as the spatial guidance and an RGB image as the color reference, and separately extracts foreground and background from the reference image with masks.<n>This design allows the diffusion model to integrate information from foreground and background independently, preventing interference and eliminating the spatial artifacts.
arXiv Detail & Related papers (2025-02-27T10:04:47Z) - MangaNinja: Line Art Colorization with Precise Reference Following [84.2001766692797]
MangaNinjia specializes in the task of reference-guided line art colorization.<n>We incorporate two thoughtful designs to ensure precise character detail transcription.<n>A patch shuffling module to facilitate correspondence learning between the reference color image and the target line art, and a point-driven control scheme to enable fine-grained color matching.
arXiv Detail & Related papers (2025-01-14T18:59:55Z) - Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting [0.1696421797495086]
Current image stitching methods produce noticeable seams in challenging scenarios such as uneven hue and large parallax.<n>We propose the Reference-Driven Inpainting Stitcher (RDIStitcher) to reformulate the image fusion and rectangling as a reference-based inpainting model.<n>We present the Multimodal Large Language Models (MLLMs)-based metrics, offering a new perspective on evaluating stitched image quality.
arXiv Detail & Related papers (2024-11-15T16:05:01Z) - TexPainter: Generative Mesh Texturing with Multi-view Consistency [20.366302413005734]
In this paper, we propose a novel method to enforce multi-view consistency.
We use an optimization-based color-fusion to enforce consistency and indirectly modify the latent codes by gradient back-propagation.
Our method improves consistency and overall quality of the generated textures as compared to competing state-of-the-arts.
arXiv Detail & Related papers (2024-05-17T18:41:36Z) - RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting [63.567363455092234]
RefFusion is a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view.
Our framework achieves state-of-the-art results for object removal while maintaining high controllability.
arXiv Detail & Related papers (2024-04-16T17:50:02Z) - ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text [5.675944597452309]
We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder.
We propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs.
arXiv Detail & Related papers (2024-01-02T22:46:12Z) - Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency [78.0488707697235]
Post-processing approach dubbed ASUKA (Aligned Stable inpainting with UnKnown Areas prior) to improve inpainting models.<n>Masked Auto-Encoder (MAE) for reconstruction-based priors mitigates object hallucination.<n> specialized VAE decoder that treats latent-to-image decoding as a local task.
arXiv Detail & Related papers (2023-12-08T05:08:06Z) - Diverse Inpainting and Editing with GAN Inversion [4.234367850767171]
Recent inversion methods have shown that real images can be inverted into StyleGAN's latent space.
In this paper, we tackle an even more difficult task, inverting erased images into GAN's latent space for realistic inpaintings and editings.
arXiv Detail & Related papers (2023-07-27T17:41:36Z) - Palette: Image-to-Image Diffusion Models [50.268441533631176]
We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models.
On four challenging image-to-image translation tasks, Palette outperforms strong GAN and regression baselines.
We report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images.
arXiv Detail & Related papers (2021-11-10T17:49:29Z) - Spatial-Separated Curve Rendering Network for Efficient and
High-Resolution Image Harmonization [59.19214040221055]
We propose a novel spatial-separated curve rendering network (S$2$CRNet) for efficient and high-resolution image harmonization.
The proposed method reduces more than 90% parameters compared with previous methods.
Our method can work smoothly on higher resolution images in real-time which is more than 10$times$ faster than the existing methods.
arXiv Detail & Related papers (2021-09-13T07:20:16Z) - Blind Face Restoration via Deep Multi-scale Component Dictionaries [75.02640809505277]
We propose a deep face dictionary network (termed as DFDNet) to guide the restoration process of degraded observations.
DFDNet generates deep dictionaries for perceptually significant face components from high-quality images.
component AdaIN is leveraged to eliminate the style diversity between the input and dictionary features.
arXiv Detail & Related papers (2020-08-02T07:02:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.