STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
- URL: http://arxiv.org/abs/2504.02823v1
- Date: Thu, 03 Apr 2025 17:59:12 GMT
- Title: STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
- Authors: Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari, Neha Gour, Abderaouf Behouch, Taimur Hassan, Syed Talal Wasim, Nabil Maalej, Muzammal Naseer, Juergen Gall, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi,
- Abstract summary: We introduce STCray, the first multimodal X-ray baggage security dataset, comprising 46,642 image-caption paired scans across 21 threat categories.<n> STCray is meticulously developed with our specialized protocol that ensures domain-aware, coherent captions.<n>This allows us to train a domain-aware visual AI assistant named STING-BEE that supports a range of vision-language tasks.
- Score: 43.69783848100359
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advancements in Computer-Aided Screening (CAS) systems are essential for improving the detection of security threats in X-ray baggage scans. However, current datasets are limited in representing real-world, sophisticated threats and concealment tactics, and existing approaches are constrained by a closed-set paradigm with predefined labels. To address these challenges, we introduce STCray, the first multimodal X-ray baggage security dataset, comprising 46,642 image-caption paired scans across 21 threat categories, generated using an X-ray scanner for airport security. STCray is meticulously developed with our specialized protocol that ensures domain-aware, coherent captions, that lead to the multi-modal instruction following data in X-ray baggage security. This allows us to train a domain-aware visual AI assistant named STING-BEE that supports a range of vision-language tasks, including scene comprehension, referring threat localization, visual grounding, and visual question answering (VQA), establishing novel baselines for multi-modal learning in X-ray baggage security. Further, STING-BEE shows state-of-the-art generalization in cross-domain settings. Code, data, and models are available at https://divs1159.github.io/STING-BEE/.
Related papers
- Superpowering Open-Vocabulary Object Detectors for X-ray Vision [53.07098133237041]
Open-vocabulary object detection (OvOD) is set to revolutionize security screening by enabling systems to recognize any item in X-ray scans.<n>We propose RAXO, a framework that repurposes off-the-shelf RGB OvOD detectors for robust X-ray detection.<n> RAXO builds high-quality X-ray class descriptors using a dual-source retrieval strategy.
arXiv Detail & Related papers (2025-03-21T11:54:16Z) - Enhancing Prohibited Item Detection through X-ray-Specific Augmentation and Contextual Feature Integration [81.11400642272976]
X-ray prohibited item detection faces challenges due to the long-tail distribution and unique characteristics of X-ray imaging.<n>Traditional data augmentation strategies, such as copy-paste and mixup, are ineffective at improving the detection of rare items.<n>We propose the X-ray Imaging-driven Detection Network (XIDNet) to address these challenges.
arXiv Detail & Related papers (2024-11-27T06:13:56Z) - X-Adv: Physical Adversarial Object Attacks against X-ray Prohibited Item
Detection [113.10386151761682]
Adversarial attacks targeting texture-free X-ray images are underexplored.
In this paper, we take the first step toward the study of adversarial attacks targeted at X-ray prohibited item detection.
We propose X-Adv to generate physically printable metals that act as an adversarial agent capable of deceiving X-ray detectors.
arXiv Detail & Related papers (2023-02-19T06:31:17Z) - Temporal Fusion Based Mutli-scale Semantic Segmentation for Detecting
Concealed Baggage Threats [12.895636885728852]
No framework exists that utilizes temporal baggage X-ray imagery to effectively screen highly concealed objects.
We present a novel temporal fusion driven multi-scale residual fashioned encoder-decoder that takes series of consecutive scans as input.
The proposed framework outperforms its competitors on the GDXray dataset on various metrics.
arXiv Detail & Related papers (2021-11-04T06:19:52Z) - Unsupervised Anomaly Instance Segmentation for Baggage Threat
Recognition [39.40595024569702]
This paper presents a novel unsupervised anomaly instance segmentation framework that recognizes baggage threats, in X-ray scans, as anomalies without requiring any ground truth labels.
Thanks to its stylization capacity, the framework is trained only once, and at the inference stage, it detects and extracts contraband items regardless of their scanner specifications.
A thorough evaluation of the proposed system on four public baggage X-ray datasets, without any re-training, demonstrates that it achieves competitive performance.
arXiv Detail & Related papers (2021-07-15T13:56:55Z) - Over-sampling De-occlusion Attention Network for Prohibited Items
Detection in Noisy X-ray Images [35.35752470993847]
Security inspection is X-ray scanning for personal belongings in suitcases.
Traditional CNN-based models trained through common image recognition datasets fail to achieve satisfactory performance in this scenario.
We propose an over-sampling de-occlusion attention network (DOAM-O), which consists of a novel de-occlusion attention module and a new over-sampling training strategy.
arXiv Detail & Related papers (2021-03-01T07:17:37Z) - Trainable Structure Tensors for Autonomous Baggage Threat Detection
Under Extreme Occlusion [45.39173572825739]
This paper presents a novel instance segmentation framework that utilizes trainable structure tensors to highlight the contours of the occluded and cluttered contraband items.
It is the only framework that has been validated on combined grayscale and colored scans obtained from four different types of X-ray scanners.
arXiv Detail & Related papers (2020-09-28T09:12:10Z) - Occluded Prohibited Items Detection: an X-ray Security Inspection
Benchmark and De-occlusion Attention Module [50.75589128518707]
We contribute the first high-quality object detection dataset for security inspection, named OPIXray.
OPIXray focused on the widely-occurred prohibited item "cutter", annotated manually by professional inspectors from the international airport.
We propose the De-occlusion Attention Module (DOAM), a plug-and-play module that can be easily inserted into and thus promote most popular detectors.
arXiv Detail & Related papers (2020-04-18T16:10:55Z) - Cascaded Structure Tensor Framework for Robust Identification of Heavily
Occluded Baggage Items from X-ray Scans [45.39173572825739]
This paper presents a cascaded structure tensor framework that can automatically extract and recognize suspicious items in heavily occluded and cluttered baggage.
The proposed framework has been rigorously evaluated using a total of 1,067,381 X-ray scans from publicly available GDXray and SIXray datasets.
arXiv Detail & Related papers (2020-04-14T20:00:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.