Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
- URL: http://arxiv.org/abs/2406.10961v1
- Date: Sun, 16 Jun 2024 14:42:52 GMT
- Title: Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
- Authors: Shuyang Lin, Tong Jia, Hao Wang, Bowen Ma, Mingyuan Li, Dongyue Chen,
- Abstract summary: We introduce distillation-based open-vocabulary object detection task into X-ray security inspection domain.
It aims to detect novel prohibited item categories beyond base categories on which the detector is trained.
X-ray feature adapter and apply it to CLIP within OVOD framework to develop OVXD model.
- Score: 6.934570446284497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: X-ray prohibited item detection is an essential component of security check and categories of prohibited item are continuously increasing in accordance with the latest laws. Previous works all focus on close-set scenarios, which can only recognize known categories used for training and often require time-consuming as well as labor-intensive annotations when learning novel categories, resulting in limited real-world applications. Although the success of vision-language models (e.g. CLIP) provides a new perspectives for open-set X-ray prohibited item detection, directly applying CLIP to X-ray domain leads to a sharp performance drop due to domain shift between X-ray data and general data used for pre-training CLIP. To address aforementioned challenges, in this paper, we introduce distillation-based open-vocabulary object detection (OVOD) task into X-ray security inspection domain by extending CLIP to learn visual representations in our specific X-ray domain, aiming to detect novel prohibited item categories beyond base categories on which the detector is trained. Specifically, we propose X-ray feature adapter and apply it to CLIP within OVOD framework to develop OVXD model. X-ray feature adapter containing three adapter submodules of bottleneck architecture, which is simple but can efficiently integrate new knowledge of X-ray domain with original knowledge, further bridge domain gap and promote alignment between X-ray images and textual concepts. Extensive experiments conducted on PIXray and PIDray datasets demonstrate that proposed method performs favorably against other baseline OVOD methods in detecting novel categories in X-ray scenario. It outperforms previous best result by 15.2 AP50 and 1.5 AP50 on PIXray and PIDray with achieving 21.0 AP50 and 27.8 AP50 respectively.
Related papers
- BGM: Background Mixup for X-ray Prohibited Items Detection [75.58709178012502]
This paper introduces a novel data augmentation approach tailored for prohibited item detection, leveraging unique characteristics inherent to X-ray imagery.
Our method is motivated by observations of physical properties including: 1) X-ray Transmission Imagery: Unlike reflected light images, transmitted X-ray pixels represent composite information from multiple materials along the imaging path.
We propose a simple yet effective X-ray image augmentation technique, Background Mixup (BGM), for prohibited item detection in security screening contexts.
arXiv Detail & Related papers (2024-11-30T12:26:55Z) - Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans? [78.26435264182763]
We introduce the Large-scale Dual-view X-ray (LDXray), which consists of 353,646 instances across 12 categories.
To emulate human intelligence in dual-view detection, we propose the Auxiliary-view Enhanced Network (AENet)
Experiments on the LDXray dataset demonstrate that the dual-view mechanism significantly enhances detection performance.
arXiv Detail & Related papers (2024-11-27T06:36:20Z) - Dual-Level Boost Network for Long-Tail Prohibited Items Detection in X-ray Security Inspection [81.11400642272976]
Long-tail distribution of prohibited items in X-ray security inspections poses a big challenge for detection models.
We propose a Dual-level Boost Network (DBNet) specifically designed to overcome these challenges in X-ray security screening.
Our approach introduces two key innovations: (1) a specific data augmentation strategy employing Poisson blending, inspired by the characteristics of X-ray images, to generate realistic synthetic instances of rare items which can effectively mitigate data imbalance; and (2) a context-aware feature enhancement module that captures the spatial and semantic interactions between objects and their surroundings, enhancing classification accuracy for underrepresented categories.
arXiv Detail & Related papers (2024-11-27T06:13:56Z) - HF-Fed: Hierarchical based customized Federated Learning Framework for X-Ray Imaging [0.0]
In clinical applications, X-ray technology is vital for noninvasive examinations like mammography, providing essential anatomical information.
X-ray reconstruction is crucial in medical imaging for detailed visual representations of internal structures, aiding diagnosis and treatment without invasive procedures.
Recent advancements in deep learning have shown promise in X-ray reconstruction, but conventional DL methods often require centralized aggregation of large datasets.
We introduce the Hierarchical Framework-based Federated Learning method (HF-Fed) for customized X-ray imaging.
arXiv Detail & Related papers (2024-07-25T05:21:48Z) - Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays [46.78926066405227]
Anomaly detection in chest X-rays is a critical task.
Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks.
We propose a position-guided prompt learning method to adapt the task data to the frozen CLIP-based model.
arXiv Detail & Related papers (2024-05-20T12:11:41Z) - Breaking with Fixed Set Pathology Recognition through Report-Guided
Contrastive Training [23.506879497561712]
We employ a contrastive global-local dual-encoder architecture to learn concepts directly from unstructured medical reports.
We evaluate our approach on the large-scale chest X-Ray datasets MIMIC-CXR, CheXpert, and ChestX-Ray14 for disease classification.
arXiv Detail & Related papers (2022-05-14T21:44:05Z) - Contrastive Attention for Automatic Chest X-ray Report Generation [124.60087367316531]
In most cases, the normal regions dominate the entire chest X-ray image, and the corresponding descriptions of these normal regions dominate the final report.
We propose Contrastive Attention (CA) model, which compares the current input image with normal images to distill the contrastive information.
We achieve the state-of-the-art results on the two public datasets.
arXiv Detail & Related papers (2021-06-13T11:20:31Z) - Occluded Prohibited Items Detection: an X-ray Security Inspection
Benchmark and De-occlusion Attention Module [50.75589128518707]
We contribute the first high-quality object detection dataset for security inspection, named OPIXray.
OPIXray focused on the widely-occurred prohibited item "cutter", annotated manually by professional inspectors from the international airport.
We propose the De-occlusion Attention Module (DOAM), a plug-and-play module that can be easily inserted into and thus promote most popular detectors.
arXiv Detail & Related papers (2020-04-18T16:10:55Z) - Towards Automatic Threat Detection: A Survey of Advances of Deep
Learning within X-ray Security Imaging [0.6091702876917279]
This paper aims to review computerised X-ray security imaging algorithms by taxonomising the field into conventional machine learning and contemporary deep learning applications.
The proposed taxonomy sub-categorises the use of deep learning approaches into supervised, semi-supervised and unsupervised learning.
Based on the current and future trends in deep learning, the paper finally presents a discussion and future directions for X-ray security imagery.
arXiv Detail & Related papers (2020-01-05T19:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.