AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era
- URL: http://arxiv.org/abs/2504.11015v1
- Date: Tue, 15 Apr 2025 09:41:08 GMT
- Title: AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era
- Authors: Chenyang Zhu, Xing Zhang, Yuyang Sun, Ching-Chun Chang, Isao Echizen,
- Abstract summary: Misrepresentations of AI-generated images as hand-drawn artwork pose serious threats to the anime community and industry.<n>We propose AnimeDL-2M, the first large-scale benchmark for anime IMDL with comprehensive annotations.<n>We also propose AniXplore, a novel model tailored to the visual characteristics of anime imagery.
- Score: 11.94929097375473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in image generation, particularly diffusion models, have significantly lowered the barrier for creating sophisticated forgeries, making image manipulation detection and localization (IMDL) increasingly challenging. While prior work in IMDL has focused largely on natural images, the anime domain remains underexplored-despite its growing vulnerability to AI-generated forgeries. Misrepresentations of AI-generated images as hand-drawn artwork, copyright violations, and inappropriate content modifications pose serious threats to the anime community and industry. To address this gap, we propose AnimeDL-2M, the first large-scale benchmark for anime IMDL with comprehensive annotations. It comprises over two million images including real, partially manipulated, and fully AI-generated samples. Experiments indicate that models trained on existing IMDL datasets of natural images perform poorly when applied to anime images, highlighting a clear domain gap between anime and natural images. To better handle IMDL tasks in anime domain, we further propose AniXplore, a novel model tailored to the visual characteristics of anime imagery. Extensive evaluations demonstrate that AniXplore achieves superior performance compared to existing methods. Dataset and code can be found in https://flytweety.github.io/AnimeDL2M/.
Related papers
- Multimodal Generation of Animatable 3D Human Models with AvatarForge [67.31920821192323]
AvatarForge is a framework for generating animatable 3D human avatars from text or image inputs using AI-driven procedural generation.
Our evaluations show that AvatarForge outperforms state-of-the-art methods in both text- and image-to-avatar generation.
arXiv Detail & Related papers (2025-03-11T08:29:18Z) - NOVA-3D: Non-overlapped Views for 3D Anime Character Reconstruction [14.509202872426942]
Non-Overlapped Views for 3D textbfAnime Character Reconstruction (NOVA-3D)
New framework implements a method for view-aware feature fusion to learn 3D-consistent features effectively.
Experiments demonstrate superior reconstruction of anime characters with exceptional detail fidelity.
arXiv Detail & Related papers (2024-05-21T05:31:03Z) - APISR: Anime Production Inspired Real-World Anime Super-Resolution [15.501488335115269]
We argue that video networks and datasets are not necessary for anime SR due to the repetition use of hand-drawing frames.
Instead, we propose an anime image collection pipeline by choosing the least compressed and the most informative frames from the video sources.
We evaluate our method through extensive experiments on the public benchmark, showing our method outperforms state-of-the-art anime dataset-trained approaches.
arXiv Detail & Related papers (2024-03-03T19:52:43Z) - AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error [15.46508882889489]
A key enabler for generating high-resolution images with low computational cost has been the development of latent diffusion models (LDMs)
LDMs perform the denoising process in the low-dimensional latent space of a pre-trained autoencoder (AE) instead of the high-dimensional image space.
We propose a novel detection method which exploits an inherent component of LDMs: the AE used to transform images between image and latent space.
arXiv Detail & Related papers (2024-01-31T14:36:49Z) - From Static to Dynamic: Adapting Landmark-Aware Image Models for Facial Expression Recognition in Videos [88.08209394979178]
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations.
We introduce a novel Static-to-Dynamic model (S2D) that leverages existing SFER knowledge and dynamic information implicitly encoded in extracted facial landmark-aware features.
arXiv Detail & Related papers (2023-12-09T03:16:09Z) - Scenimefy: Learning to Craft Anime Scene via Semi-Supervised
Image-to-Image Translation [75.91455714614966]
We propose Scenimefy, a novel semi-supervised image-to-image translation framework.
Our approach guides the learning with structure-consistent pseudo paired data.
A patch-wise contrastive style loss is introduced to improve stylization and fine details.
arXiv Detail & Related papers (2023-08-24T17:59:50Z) - Learning 3D Photography Videos via Self-supervised Diffusion on Single
Images [105.81348348510551]
3D photography renders a static image into a video with appealing 3D visual effects.
Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints.
We present a novel task: out-animation, which extends the space and time of input objects.
arXiv Detail & Related papers (2023-02-21T16:18:40Z) - AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies [98.65469430034246]
Existing datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements.
We present a new 2D animation visual correspondence dataset, AnimeRun, by converting open source 3D movies to full scenes in 2D style.
Our analyses show that the proposed dataset not only resembles real anime more in image composition, but also possesses richer and more complex motion patterns compared to existing datasets.
arXiv Detail & Related papers (2022-11-10T17:26:21Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - Enhancement of Anime Imaging Enlargement using Modified Super-Resolution
CNN [0.0]
We propose a model based on convolutional neural networks to extract outstanding features of images, enlarge those images, and enhance the quality of Anime images.
The experimental results indicated that our model successfully enhanced the image quality with a larger image-size when compared with the common existing image enlargement and the original SRCNN method.
arXiv Detail & Related papers (2021-10-05T19:38:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.