Related papers: Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP

Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP

URL: http://arxiv.org/abs/2406.10961v1
Date: Sun, 16 Jun 2024 14:42:52 GMT
Title: Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
Authors: Shuyang Lin, Tong Jia, Hao Wang, Bowen Ma, Mingyuan Li, Dongyue Chen,
Abstract summary: We introduce distillation-based open-vocabulary object detection task into X-ray security inspection domain. It aims to detect novel prohibited item categories beyond base categories on which the detector is trained. X-ray feature adapter and apply it to CLIP within OVOD framework to develop OVXD model.
Score: 6.934570446284497
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: X-ray prohibited item detection is an essential component of security check and categories of prohibited item are continuously increasing in accordance with the latest laws. Previous works all focus on close-set scenarios, which can only recognize known categories used for training and often require time-consuming as well as labor-intensive annotations when learning novel categories, resulting in limited real-world applications. Although the success of vision-language models (e.g. CLIP) provides a new perspectives for open-set X-ray prohibited item detection, directly applying CLIP to X-ray domain leads to a sharp performance drop due to domain shift between X-ray data and general data used for pre-training CLIP. To address aforementioned challenges, in this paper, we introduce distillation-based open-vocabulary object detection (OVOD) task into X-ray security inspection domain by extending CLIP to learn visual representations in our specific X-ray domain, aiming to detect novel prohibited item categories beyond base categories on which the detector is trained. Specifically, we propose X-ray feature adapter and apply it to CLIP within OVOD framework to develop OVXD model. X-ray feature adapter containing three adapter submodules of bottleneck architecture, which is simple but can efficiently integrate new knowledge of X-ray domain with original knowledge, further bridge domain gap and promote alignment between X-ray images and textual concepts. Extensive experiments conducted on PIXray and PIDray datasets demonstrate that proposed method performs favorably against other baseline OVOD methods in detecting novel categories in X-ray scenario. It outperforms previous best result by 15.2 AP50 and 1.5 AP50 on PIXray and PIDray with achieving 21.0 AP50 and 27.8 AP50 respectively.

Related papers

Superpowering Open-Vocabulary Object Detectors for X-ray Vision [53.07098133237041]
Open-vocabulary object detection (OvOD) is set to revolutionize security screening by enabling systems to recognize any item in X-ray scans. We propose RAXO, a framework that repurposes off-the-shelf RGB OvOD detectors for robust X-ray detection. RAXO builds high-quality X-ray class descriptors using a dual-source retrieval strategy.
arXiv Detail & Related papers (2025-03-21T11:54:16Z)
BGM: Background Mixup for X-ray Prohibited Items Detection [75.58709178012502]
This paper introduces a novel data augmentation approach tailored for prohibited item detection, leveraging unique characteristics inherent to X-ray imagery. Our method is motivated by observations of physical properties including: 1) X-ray Transmission Imagery: Unlike reflected light images, transmitted X-ray pixels represent composite information from multiple materials along the imaging path. We propose a simple yet effective X-ray image augmentation technique, Background Mixup (BGM), for prohibited item detection in security screening contexts.
arXiv Detail & Related papers (2024-11-30T12:26:55Z)
Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans? [78.26435264182763]
We introduce the Large-scale Dual-view X-ray (LDXray), which consists of 353,646 instances across 12 categories. To emulate human intelligence in dual-view detection, we propose the Auxiliary-view Enhanced Network (AENet) Experiments on the LDXray dataset demonstrate that the dual-view mechanism significantly enhances detection performance.
arXiv Detail & Related papers (2024-11-27T06:36:20Z)
Enhancing Prohibited Item Detection through X-ray-Specific Augmentation and Contextual Feature Integration [81.11400642272976]
X-ray prohibited item detection faces challenges due to the long-tail distribution and unique characteristics of X-ray imaging. Traditional data augmentation strategies, such as copy-paste and mixup, are ineffective at improving the detection of rare items. We propose the X-ray Imaging-driven Detection Network (XIDNet) to address these challenges.
arXiv Detail & Related papers (2024-11-27T06:13:56Z)
HF-Fed: Hierarchical based customized Federated Learning Framework for X-Ray Imaging [0.0]
In clinical applications, X-ray technology is vital for noninvasive examinations like mammography, providing essential anatomical information. X-ray reconstruction is crucial in medical imaging for detailed visual representations of internal structures, aiding diagnosis and treatment without invasive procedures. Recent advancements in deep learning have shown promise in X-ray reconstruction, but conventional DL methods often require centralized aggregation of large datasets. We introduce the Hierarchical Framework-based Federated Learning method (HF-Fed) for customized X-ray imaging.
arXiv Detail & Related papers (2024-07-25T05:21:48Z)
Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays [46.78926066405227]
Anomaly detection in chest X-rays is a critical task. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. We propose a position-guided prompt learning method to adapt the task data to the frozen CLIP-based model.
arXiv Detail & Related papers (2024-05-20T12:11:41Z)
X-Adv: Physical Adversarial Object Attacks against X-ray Prohibited Item Detection [113.10386151761682]
Adversarial attacks targeting texture-free X-ray images are underexplored. In this paper, we take the first step toward the study of adversarial attacks targeted at X-ray prohibited item detection. We propose X-Adv to generate physically printable metals that act as an adversarial agent capable of deceiving X-ray detectors.
arXiv Detail & Related papers (2023-02-19T06:31:17Z)
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training [23.506879497561712]
We employ a contrastive global-local dual-encoder architecture to learn concepts directly from unstructured medical reports. We evaluate our approach on the large-scale chest X-Ray datasets MIMIC-CXR, CheXpert, and ChestX-Ray14 for disease classification.
arXiv Detail & Related papers (2022-05-14T21:44:05Z)
Contrastive Attention for Automatic Chest X-ray Report Generation [124.60087367316531]
In most cases, the normal regions dominate the entire chest X-ray image, and the corresponding descriptions of these normal regions dominate the final report. We propose Contrastive Attention (CA) model, which compares the current input image with normal images to distill the contrastive information. We achieve the state-of-the-art results on the two public datasets.
arXiv Detail & Related papers (2021-06-13T11:20:31Z)
Cross-Modal Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop [63.81818077092879]
We propose an end-to-end semi-supervised cross-modal contrastive learning framework for medical images. We first apply an image encoder to classify the chest X-rays and to generate the image features. The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray.
arXiv Detail & Related papers (2021-04-11T09:16:29Z)
Learning Invariant Feature Representation to Improve Generalization across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source. By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z)
Occluded Prohibited Items Detection: an X-ray Security Inspection Benchmark and De-occlusion Attention Module [50.75589128518707]
We contribute the first high-quality object detection dataset for security inspection, named OPIXray. OPIXray focused on the widely-occurred prohibited item "cutter", annotated manually by professional inspectors from the international airport. We propose the De-occlusion Attention Module (DOAM), a plug-and-play module that can be easily inserted into and thus promote most popular detectors.
arXiv Detail & Related papers (2020-04-18T16:10:55Z)
Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging [0.6091702876917279]
This paper aims to review computerised X-ray security imaging algorithms by taxonomising the field into conventional machine learning and contemporary deep learning applications. The proposed taxonomy sub-categorises the use of deep learning approaches into supervised, semi-supervised and unsupervised learning. Based on the current and future trends in deep learning, the paper finally presents a discussion and future directions for X-ray security imagery.
arXiv Detail & Related papers (2020-01-05T19:17:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.