Related papers: Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios

Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios

URL: http://arxiv.org/abs/2507.09915v1
Date: Mon, 14 Jul 2025 04:41:38 GMT
Title: Crucial-Diff: A Unified Diffusion Model for Crucial Image and Annotation Synthesis in Data-scarce Scenarios
Authors: Siyue Yao, Mingjie Sun, Eng Gee Lim, Ran Yi, Baojiang Zhong, Moncef Gabbouj,
Abstract summary: Crucial-Diff is a domain-agnostic framework designed to synthesize crucial samples.<n>Our framework generates diverse, high-quality training data, achieving a pixel-level AP of 83.63% and an F1-MAX of 78.12% on MVTec.
Score: 36.938892077684635
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The scarcity of data in various scenarios, such as medical, industry and autonomous driving, leads to model overfitting and dataset imbalance, thus hindering effective detection and segmentation performance. Existing studies employ the generative models to synthesize more training samples to mitigate data scarcity. However, these synthetic samples are repetitive or simplistic and fail to provide "crucial information" that targets the downstream model's weaknesses. Additionally, these methods typically require separate training for different objects, leading to computational inefficiencies. To address these issues, we propose Crucial-Diff, a domain-agnostic framework designed to synthesize crucial samples. Our method integrates two key modules. The Scene Agnostic Feature Extractor (SAFE) utilizes a unified feature extractor to capture target information. The Weakness Aware Sample Miner (WASM) generates hard-to-detect samples using feedback from the detection results of downstream model, which is then fused with the output of SAFE module. Together, our Crucial-Diff framework generates diverse, high-quality training data, achieving a pixel-level AP of 83.63% and an F1-MAX of 78.12% on MVTec. On polyp dataset, Crucial-Diff reaches an mIoU of 81.64% and an mDice of 87.69%. Code will be released after acceptance.

Related papers

Leveraging Text-to-Image Generation for Handling Spurious Correlation [24.940576844328408]
Deep neural networks trained with Empirical Risk Minimization (ERM) perform well when both training and test data come from the same domain.<n>ERM models may rely on spurious correlations that often exist between labels and irrelevant features of images, making predictions unreliable when those features do not exist.<n>We propose a technique to generate training samples with text-to-image (T2I) diffusion models for addressing the spurious correlation problem.
arXiv Detail & Related papers (2025-03-21T15:28:22Z)
Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments [20.554546406575]
We propose an OOD detection framework, MixDiff, that is applicable even when the model's parameters or its activations are not accessible to the end user. We provide theoretical analysis to illustrate MixDiff's effectiveness in discerning OOD samples that induce overconfident outputs from the model.
arXiv Detail & Related papers (2024-08-19T15:51:31Z)
MDM: Advancing Multi-Domain Distribution Matching for Automatic Modulation Recognition Dataset Synthesis [35.07663680944459]
Deep learning technology has been successfully introduced into Automatic Modulation Recognition (AMR) tasks. The success of deep learning is all attributed to the training on large-scale datasets. In order to solve the problem of large amount of data, some researchers put forward the method of data distillation.
arXiv Detail & Related papers (2024-08-05T14:16:54Z)
From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying [10.919336198760808]
We introduce a novel methodology to detect leaked data that are used to train classification models. textscLDSS involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset. This enables the effective identification of models trained on leaked data through model querying alone.
arXiv Detail & Related papers (2023-10-06T10:36:28Z)
Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations. In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
Domain Generalization via Ensemble Stacking for Face Presentation Attack Detection [4.61143637299349]
Face Presentation Attack Detection (PAD) plays a pivotal role in securing face recognition systems against spoofing attacks. This work proposes a comprehensive solution that combines synthetic data generation and deep ensemble learning. Experimental results on four datasets demonstrate low half total error rates (HTERs) on three benchmark datasets.
arXiv Detail & Related papers (2023-01-05T16:44:36Z)
ScoreMix: A Scalable Augmentation Strategy for Training GANs with Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available. We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z)
BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data. The proposed method is compared with two statistical approaches based on Universal and User-dependent models. Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z)
Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective. Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z)
Robust Finite Mixture Regression for Heterogeneous Targets [70.19798470463378]
We propose an FMR model that finds sample clusters and jointly models multiple incomplete mixed-type targets simultaneously. We provide non-asymptotic oracle performance bounds for our model under a high-dimensional learning framework. The results show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-10-12T03:27:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.