Related papers: Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation

Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation

URL: http://arxiv.org/abs/2505.06117v1
Date: Fri, 09 May 2025 15:16:42 GMT
Title: Photovoltaic Defect Image Generator with Boundary Alignment Smoothing Constraint for Domain Shift Mitigation
Authors: Dongying Li, Binyi Su, Hua Zhang, Yong Li, Haiyong Chen,
Abstract summary: We propose PDIG, a Photovoltaic Defect Image Generator based on Stable Diffusion (SD)<n>PDIG leverages the strong priors learned from large-scale datasets to enhance generation quality under limited data.<n>Our approach improves Frechet Inception Distance (FID) by 19.16 points over the second-best method.
Score: 7.166413857036151
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate defect detection of photovoltaic (PV) cells is critical for ensuring quality and efficiency in intelligent PV manufacturing systems. However, the scarcity of rich defect data poses substantial challenges for effective model training. While existing methods have explored generative models to augment datasets, they often suffer from instability, limited diversity, and domain shifts. To address these issues, we propose PDIG, a Photovoltaic Defect Image Generator based on Stable Diffusion (SD). PDIG leverages the strong priors learned from large-scale datasets to enhance generation quality under limited data. Specifically, we introduce a Semantic Concept Embedding (SCE) module that incorporates text-conditioned priors to capture the relational concepts between defect types and their appearances. To further enrich the domain distribution, we design a Lightweight Industrial Style Adaptor (LISA), which injects industrial defect characteristics into the SD model through cross-disentangled attention. At inference, we propose a Text-Image Dual-Space Constraints (TIDSC) module, enforcing the quality of generated images via positional consistency and spatial smoothing alignment. Extensive experiments demonstrate that PDIG achieves superior realism and diversity compared to state-of-the-art methods. Specifically, our approach improves Frechet Inception Distance (FID) by 19.16 points over the second-best method and significantly enhances the performance of downstream defect detection tasks.

Related papers

Solving Inverse Problems with FLAIR [59.02385492199431]
Flow-based latent generative models are able to generate images with remarkable quality, even enabling text-to-image generation.<n>We present FLAIR, a novel training free variational framework that leverages flow-based generative models as a prior for inverse problems.<n>Results on standard imaging benchmarks demonstrate that FLAIR consistently outperforms existing diffusion- and flow-based methods in terms of reconstruction quality and sample diversity.
arXiv Detail & Related papers (2025-06-03T09:29:47Z)
EIAD: Explainable Industrial Anomaly Detection Via Multi-Modal Large Language Models [23.898938659720503]
Industrial Anomaly Detection (IAD) is critical to ensure product quality during manufacturing.<n>We propose a novel approach that introduces a dedicated multi-modal defect localization module to decouple the dialog functionality from the core feature extraction.<n>We also contribute to the first multi-modal industrial anomaly detection training dataset, named Defect Detection Question Answering (DDQA)
arXiv Detail & Related papers (2025-03-18T11:33:29Z)
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models [52.343627275005026]
GIFT is a novel continual fine-tuning approach to overcome catastrophic forgetting in Vision-Language Models.<n>We employ a pre-trained diffusion model to recreate both pre-training and learned downstream task data.<n>Our method consistently outperforms previous state-of-the-art approaches across various settings.
arXiv Detail & Related papers (2025-03-06T09:09:18Z)
Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation.<n>It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL)<n>It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z)
Bring the Power of Diffusion Model to Defect Detection [0.0]
diffusion probabilistic model (DDPM) is pre-trained to extract the features of denoising process to construct as a feature repository. The queried latent features are reconstructed and filtered to obtain high-dimensional DDPM features. Experiment results demonstrate that our method achieves competitive results on several industrial datasets.
arXiv Detail & Related papers (2024-08-25T14:28:49Z)
Looking for Tiny Defects via Forward-Backward Feature Transfer [12.442574943138794]
We introduce a novel benchmark that evaluates methods on the original, high-resolution image and ground-truth masks. Our benchmark includes a metric that captures robustness with respect to defect size. Our proposal features the highest robustness to defect size, runs at the fastest speed and yields state-of-the-art segmentation performance.
arXiv Detail & Related papers (2024-07-04T17:59:26Z)
Progressive Alignment with VLM-LLM Feature to Augment Defect Classification for the ASE Dataset [7.1083241462091165]
Traditional defect classification approaches are facing with two barriers. Insufficient training data and unstable data quality. We propose the special dataset, including rich data description recorded on image, for defect classification, but the defect feature is uneasy to learn directly.
arXiv Detail & Related papers (2024-04-08T04:17:27Z)
DiAD: A Diffusion-based Framework for Multi-class Anomaly Detection [55.48770333927732]
We propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection. It consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-12-11T18:38:28Z)
CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation [73.02218479926469]
We propose a transformer network with multi-stage CNN feature injection for surface defect segmentation. CINFormer presents a simple yet effective feature integration mechanism that injects the multi-level CNN features of the input image into different stages of the transformer network in the encoder. In addition, CINFormer presents a Top-K self-attention module to focus on tokens with more important information about the defects.
arXiv Detail & Related papers (2023-09-22T06:12:02Z)
Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.