Related papers: AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis

AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis

URL: http://arxiv.org/abs/2503.07253v2
Date: Tue, 11 Mar 2025 09:23:10 GMT
Title: AnomalyPainter: Vision-Language-Diffusion Synergy for Zero-Shot Realistic and Diverse Industrial Anomaly Synthesis
Authors: Zhangyu Lai, Yilin Lu, Xinyang Li, Jianghang Lin, Yansong Qu, Liujuan Cao, Ming Li, Rongrong Ji,
Abstract summary: AnomalyPainter is a framework that synergizes Vision Language Large Model, Latent Diffusion Model, and texture library Tex-9K.<n> Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis.<n>Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization.
Score: 52.081638586098
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a zero-shot framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.

Related papers

LaFiTe: A Generative Latent Field for 3D Native Texturing [72.05710323154288]
Existing native approaches are sparse by the absence of a powerful and versatile representation, which severely limits the fidelity and generality of their generated textures.<n>We introduce LaFiTe, which generates high-quality textures constrained by a sparse color representation and UV parameterization.
arXiv Detail & Related papers (2025-12-04T13:33:49Z)
FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies [10.597504007889063]
FerretNet is a lightweight neural network with only 1.1M parameters that delivers efficient and robust synthetic image detection.<n>FerretNet, trained exclusively on the 4-class ProGAN dataset, achieves an average accuracy of 97.1% on an open-world benchmark comprising 22 generative models.
arXiv Detail & Related papers (2025-09-25T08:28:32Z)
LEGION: Learning to Ground and Explain for Synthetic Image Detection [49.958951540410816]
We introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. We propose LEGION, a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation.
arXiv Detail & Related papers (2025-03-19T14:37:21Z)
MaterialMVP: Illumination-Invariant Material Generation via Multi-view PBR Diffusion [37.596740171045845]
Physically-based rendering (PBR) has become a cornerstone in modern computer graphics, enabling realistic material representation and lighting interactions in 3D scenes. We present a novel end-to-end model for generating PBR textures from 3D meshes and image prompts, addressing key challenges in multi-view material synthesis.
arXiv Detail & Related papers (2025-03-13T11:57:30Z)
Texture Image Synthesis Using Spatial GAN Based on Vision Transformers [1.6482333106552793]
We propose ViT-SGAN, a new hybrid model that fuses Vision Transformers (ViTs) with a Spatial Generative Adversarial Network (SGAN) to address the limitations of previous methods.<n>By incorporating specialized texture descriptors such as mean-variance (mu, sigma) and textons into the self-attention mechanism of ViTs, our model achieves superior texture synthesis.
arXiv Detail & Related papers (2025-02-03T21:39:30Z)
FashionR2R: Texture-preserving Rendered-to-Real Image Translation with Diffusion Models [14.596090302381647]
This paper studies photorealism enhancement of rendered images, leveraging generative power from diffusion models on the controlled basis of rendering. We introduce a novel framework to translate rendered images into their realistic counterparts, which consists of two stages: Domain Knowledge Injection (DKI) and Realistic Image Generation (RIG)
arXiv Detail & Related papers (2024-10-18T12:48:22Z)
SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model [15.616316848126642]
We develop a comprehensive artifact taxonomy and construct a dataset of synthetic images with artifact annotations for fine-tuning Vision-Language Model (VLM) The fine-tuned VLM exhibits superior ability of identifying artifacts and outperforms the baseline by 25.66%.
arXiv Detail & Related papers (2024-02-28T05:54:02Z)
CreativeSynth: Cross-Art-Attention for Artistic Image Synthesis with Multimodal Diffusion [73.08710648258985]
Key painting attributes including layout, perspective, shape, and semantics often cannot be conveyed and expressed through style transfer.<n>Large-scale pretrained text-to-image generation models have demonstrated their capability to synthesize a vast amount of high-quality images.<n>Our main novel idea is to integrate multimodal semantic information as a synthesis guide into artworks, rather than transferring style to the real world.
arXiv Detail & Related papers (2024-01-25T10:42:09Z)
TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models [77.85129451435704]
We present a new method to synthesize textures for 3D, using large-scale-guided image diffusion models. Specifically, we leverage latent diffusion models, apply the set denoising model and aggregate denoising text map.
arXiv Detail & Related papers (2023-10-20T19:15:29Z)
Perceptual Artifacts Localization for Image Synthesis Tasks [59.638307505334076]
We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels. A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks. We propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images.
arXiv Detail & Related papers (2023-10-09T10:22:08Z)
SeamlessGAN: Self-Supervised Synthesis of Tileable Texture Maps [3.504542161036043]
We present SeamlessGAN, a method capable of automatically generating tileable texture maps from a single input exemplar. In contrast to most existing methods, focused solely on solving the synthesis problem, our work tackles both problems, synthesis and tileability, simultaneously.
arXiv Detail & Related papers (2022-01-13T18:24:26Z)
DIB-R++: Learning to Predict Lighting and Material with a Hybrid Differentiable Renderer [78.91753256634453]
We consider the challenging problem of predicting intrinsic object properties from a single image by exploiting differentiables. In this work, we propose DIBR++, a hybrid differentiable which supports these effects by combining specularization and ray-tracing. Compared to more advanced physics-based differentiables, DIBR++ is highly performant due to its compact and expressive model.
arXiv Detail & Related papers (2021-10-30T01:59:39Z)
Aggregated Contextual Transformations for High-Resolution Image Inpainting [57.241749273816374]
We propose an enhanced GAN-based model, named Aggregated COntextual-Transformation GAN (AOT-GAN) for high-resolution image inpainting. To enhance context reasoning, we construct the generator of AOT-GAN by stacking multiple layers of a proposed AOT block. For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.
arXiv Detail & Related papers (2021-04-03T15:50:17Z)
Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement [78.58603635621591]
Training an unpaired synthetic-to-real translation network in image space is severely under-constrained. We propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image. Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets.
arXiv Detail & Related papers (2020-03-27T21:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.