Taming Generative Synthetic Data for X-ray Prohibited Item Detection
- URL: http://arxiv.org/abs/2511.15299v1
- Date: Wed, 19 Nov 2025 10:07:11 GMT
- Title: Taming Generative Synthetic Data for X-ray Prohibited Item Detection
- Authors: Jialong Sun, Hongguang Zhu, Weizhe Liu, Yunda Sun, Renshuai Tao, Yunchao Wei,
- Abstract summary: Training prohibited item detection models requires a large amount of X-ray security images.<n>X-ray security image synthesis methods composite images to scale up datasets.<n>We propose a one-stage X-ray security image synthesis pipeline (Xsyn) based on text-to-image generation.
- Score: 48.23410488654841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Training prohibited item detection models requires a large amount of X-ray security images, but collecting and annotating these images is time-consuming and laborious. To address data insufficiency, X-ray security image synthesis methods composite images to scale up datasets. However, previous methods primarily follow a two-stage pipeline, where they implement labor-intensive foreground extraction in the first stage and then composite images in the second stage. Such a pipeline introduces inevitable extra labor cost and is not efficient. In this paper, we propose a one-stage X-ray security image synthesis pipeline (Xsyn) based on text-to-image generation, which incorporates two effective strategies to improve the usability of synthetic images. The Cross-Attention Refinement (CAR) strategy leverages the cross-attention map from the diffusion model to refine the bounding box annotation. The Background Occlusion Modeling (BOM) strategy explicitly models background occlusion in the latent space to enhance imaging complexity. To the best of our knowledge, compared with previous methods, Xsyn is the first to achieve high-quality X-ray security image synthesis without extra labor cost. Experiments demonstrate that our method outperforms all previous methods with 1.2% mAP improvement, and the synthetic images generated by our method are beneficial to improve prohibited item detection performance across various X-ray security datasets and detectors. Code is available at https://github.com/pILLOW-1/Xsyn/.
Related papers
- Anomaly Detection by Effectively Leveraging Synthetic Images [3.9887243611436873]
Anomaly detection plays a vital role in industrial manufacturing.<n>Due to the scarcity of real defect images, unsupervised approaches that rely solely on normal images have been extensively studied.<n>In this work, we focus on a strategy to effectively leverage synthetic images to maximize the anomaly detection performance.
arXiv Detail & Related papers (2025-12-29T06:06:30Z) - CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI [58.35348718345307]
Current efforts to distinguish between real and AI-generated images may lack generalization.<n>We propose a novel framework, Co-Spy, that first enhances existing semantic features.<n>We also create Co-Spy-Bench, a comprehensive dataset comprising 5 real image datasets and 22 state-of-the-art generative models.
arXiv Detail & Related papers (2025-03-24T01:59:29Z) - Synthetic Lung X-ray Generation through Cross-Attention and Affinity Transformation [4.956977275061966]
This paper introduces a new method for the automatic generation of accurate semantic masks from synthetic lung X-ray images.<n>It uses cross-attention mapping between text and image to extend text-driven image synthesis to semantic mask generation.<n>The experimental results demonstrate that segmentation models trained on synthetic data generated using the method are comparable to, and in some cases even better than, models trained on real datasets.
arXiv Detail & Related papers (2025-03-10T11:48:26Z) - DiffDoctor: Diagnosing Image Diffusion Models Before Treating [57.82359018425674]
We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts.<n>We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process.<n>The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback.
arXiv Detail & Related papers (2025-01-21T18:56:41Z) - Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item Detection under Noisy Annotations [52.065764858163476]
Automatic X-ray prohibited item detection is vital for public safety.<n>Existing deep learning-based methods all assume that the annotations of training X-ray images are correct.<n>We propose an effective label-aware mixed patch paste augmentation method (Mix-Paste)<n>We show the superiority of our method on X-ray datasets under noisy annotations.
arXiv Detail & Related papers (2025-01-03T09:51:51Z) - BGM: Background Mixup for X-ray Prohibited Items Detection [75.58709178012502]
Background Mixup (BGM) is a background-based augmentation technique tailored for X-ray security imaging domain.<n>Unlike conventional methods, BGM is founded on an in-depth analysis of physical properties.<n>BGM mixes background patches across regions on both 1) texture structure and 2) material variation, to benefit models from complicated background cues.
arXiv Detail & Related papers (2024-11-30T12:26:55Z) - Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes.
We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection.
We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z) - Synthesis of COVID-19 Chest X-rays using Unpaired Image-to-Image
Translation [6.22964000148682]
We build the first-of-its-kind open dataset of synthetic COVID-19 chest X-ray images of high fidelity using an unsupervised domain adaptation approach.
We show considerable performance improvements on COVID-19 detection using various deep learning architectures.
Our publicly available benchmark dataset consists of 21,295 synthetic COVID-19 chest X-ray images.
arXiv Detail & Related papers (2020-10-20T13:37:40Z) - EMIXER: End-to-end Multimodal X-ray Generation via Self-supervision [39.07263052525579]
We propose an End-to-end MultImodal X-ray genERative model (EMIXER) for jointly synthesizing x-ray images and corresponding free-text reports.
EMIXER is an conditional generative adversarial model by 1) generating an image based on a label, 2) encoding the image to a hidden embedding, 3) producing the corresponding text via a hierarchical decoder from the image embedding, and 4) a joint discriminator for assessing both the image and the corresponding text.
We show that EMIXER generated synthetic datasets can augment X-ray image classification, report generation models to achieve 5.
arXiv Detail & Related papers (2020-07-10T20:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.