Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
- URL: http://arxiv.org/abs/2406.10961v1
- Date: Sun, 16 Jun 2024 14:42:52 GMT
- Title: Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP
- Authors: Shuyang Lin, Tong Jia, Hao Wang, Bowen Ma, Mingyuan Li, Dongyue Chen,
- Abstract summary: We introduce distillation-based open-vocabulary object detection task into X-ray security inspection domain.
It aims to detect novel prohibited item categories beyond base categories on which the detector is trained.
X-ray feature adapter and apply it to CLIP within OVOD framework to develop OVXD model.
- Score: 6.934570446284497
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: X-ray prohibited item detection is an essential component of security check and categories of prohibited item are continuously increasing in accordance with the latest laws. Previous works all focus on close-set scenarios, which can only recognize known categories used for training and often require time-consuming as well as labor-intensive annotations when learning novel categories, resulting in limited real-world applications. Although the success of vision-language models (e.g. CLIP) provides a new perspectives for open-set X-ray prohibited item detection, directly applying CLIP to X-ray domain leads to a sharp performance drop due to domain shift between X-ray data and general data used for pre-training CLIP. To address aforementioned challenges, in this paper, we introduce distillation-based open-vocabulary object detection (OVOD) task into X-ray security inspection domain by extending CLIP to learn visual representations in our specific X-ray domain, aiming to detect novel prohibited item categories beyond base categories on which the detector is trained. Specifically, we propose X-ray feature adapter and apply it to CLIP within OVOD framework to develop OVXD model. X-ray feature adapter containing three adapter submodules of bottleneck architecture, which is simple but can efficiently integrate new knowledge of X-ray domain with original knowledge, further bridge domain gap and promote alignment between X-ray images and textual concepts. Extensive experiments conducted on PIXray and PIDray datasets demonstrate that proposed method performs favorably against other baseline OVOD methods in detecting novel categories in X-ray scenario. It outperforms previous best result by 15.2 AP50 and 1.5 AP50 on PIXray and PIDray with achieving 21.0 AP50 and 27.8 AP50 respectively.
Related papers
- HF-Fed: Hierarchical based customized Federated Learning Framework for X-Ray Imaging [0.0]
In clinical applications, X-ray technology is vital for noninvasive examinations like mammography, providing essential anatomical information.
X-ray reconstruction is crucial in medical imaging for detailed visual representations of internal structures, aiding diagnosis and treatment without invasive procedures.
Recent advancements in deep learning have shown promise in X-ray reconstruction, but conventional DL methods often require centralized aggregation of large datasets.
We introduce the Hierarchical Framework-based Federated Learning method (HF-Fed) for customized X-ray imaging.
arXiv Detail & Related papers (2024-07-25T05:21:48Z) - Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays [46.78926066405227]
Anomaly detection in chest X-rays is a critical task.
Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks.
We propose a position-guided prompt learning method to adapt the task data to the frozen CLIP-based model.
arXiv Detail & Related papers (2024-05-20T12:11:41Z) - X-Adv: Physical Adversarial Object Attacks against X-ray Prohibited Item
Detection [113.10386151761682]
Adversarial attacks targeting texture-free X-ray images are underexplored.
In this paper, we take the first step toward the study of adversarial attacks targeted at X-ray prohibited item detection.
We propose X-Adv to generate physically printable metals that act as an adversarial agent capable of deceiving X-ray detectors.
arXiv Detail & Related papers (2023-02-19T06:31:17Z) - Breaking with Fixed Set Pathology Recognition through Report-Guided
Contrastive Training [23.506879497561712]
We employ a contrastive global-local dual-encoder architecture to learn concepts directly from unstructured medical reports.
We evaluate our approach on the large-scale chest X-Ray datasets MIMIC-CXR, CheXpert, and ChestX-Ray14 for disease classification.
arXiv Detail & Related papers (2022-05-14T21:44:05Z) - Contrastive Attention for Automatic Chest X-ray Report Generation [124.60087367316531]
In most cases, the normal regions dominate the entire chest X-ray image, and the corresponding descriptions of these normal regions dominate the final report.
We propose Contrastive Attention (CA) model, which compares the current input image with normal images to distill the contrastive information.
We achieve the state-of-the-art results on the two public datasets.
arXiv Detail & Related papers (2021-06-13T11:20:31Z) - Cross-Modal Contrastive Learning for Abnormality Classification and
Localization in Chest X-rays with Radiomics using a Feedback Loop [63.81818077092879]
We propose an end-to-end semi-supervised cross-modal contrastive learning framework for medical images.
We first apply an image encoder to classify the chest X-rays and to generate the image features.
The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray.
arXiv Detail & Related papers (2021-04-11T09:16:29Z) - Learning Invariant Feature Representation to Improve Generalization
across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source.
By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z) - Occluded Prohibited Items Detection: an X-ray Security Inspection
Benchmark and De-occlusion Attention Module [50.75589128518707]
We contribute the first high-quality object detection dataset for security inspection, named OPIXray.
OPIXray focused on the widely-occurred prohibited item "cutter", annotated manually by professional inspectors from the international airport.
We propose the De-occlusion Attention Module (DOAM), a plug-and-play module that can be easily inserted into and thus promote most popular detectors.
arXiv Detail & Related papers (2020-04-18T16:10:55Z) - Towards Automatic Threat Detection: A Survey of Advances of Deep
Learning within X-ray Security Imaging [0.6091702876917279]
This paper aims to review computerised X-ray security imaging algorithms by taxonomising the field into conventional machine learning and contemporary deep learning applications.
The proposed taxonomy sub-categorises the use of deep learning approaches into supervised, semi-supervised and unsupervised learning.
Based on the current and future trends in deep learning, the paper finally presents a discussion and future directions for X-ray security imagery.
arXiv Detail & Related papers (2020-01-05T19:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.