Related papers: Illicit object detection in X-ray images using Vision Transformers

Illicit object detection in X-ray images using Vision Transformers

URL: http://arxiv.org/abs/2403.19043v2
Date: Mon, 29 Apr 2024 13:08:36 GMT
Title: Illicit object detection in X-ray images using Vision Transformers
Authors: Jorgen Cani, Ioannis Mademlis, Adamantia Anna Rebolledo Chrysochoou, Georgios Th. Papadopoulos,
Abstract summary: Illicit object detection is a critical task performed at various high-security locations. This study utilizes both Transformer and hybrid backbones, such as SWIN and NextViT, and detectors, such as DINO and RT-DETR.
Score: 6.728794938150435
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Illicit object detection is a critical task performed at various high-security locations, including airports, train stations, subways, and ports. The continuous and tedious work of examining thousands of X-ray images per hour can be mentally taxing. Thus, Deep Neural Networks (DNNs) can be used to automate the X-ray image analysis process, improve efficiency and alleviate the security officers' inspection burden. The neural architectures typically utilized in relevant literature are Convolutional Neural Networks (CNNs), with Vision Transformers (ViTs) rarely employed. In order to address this gap, this paper conducts a comprehensive evaluation of relevant ViT architectures on illicit item detection in X-ray images. This study utilizes both Transformer and hybrid backbones, such as SWIN and NextViT, and detectors, such as DINO and RT-DETR. The results demonstrate the remarkable accuracy of the DINO Transformer detector in the low-data regime, the impressive real-time performance of YOLOv8, and the effectiveness of the hybrid NextViT backbone.

Related papers

X-ray illicit object detection using hybrid CNN-transformer neural network architectures [9.33554429903529]
In X-ray security imaging the literature has been dominated by the use of CNN-based methods. Various hybrid CNN-transformer architectures are evaluated against a common CNN object detection baseline, namely YOLOv8. The resulting architectures are comparatively evaluated on three challenging public X-ray inspection datasets.
arXiv Detail & Related papers (2025-05-01T14:40:38Z)
A Comparative Study of CNN, ResNet, and Vision Transformers for Multi-Classification of Chest Diseases [0.0]
Vision Transformers (ViT) are powerful tools due to their scalability and ability to process large amounts of data. We fine-tuned two variants of ViT models, one pre-trained on ImageNet and another trained from scratch, using the NIH Chest X-ray dataset. Our study evaluates the performance of these models in the multi-label classification of 14 distinct diseases.
arXiv Detail & Related papers (2024-05-31T23:56:42Z)
Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images [1.5727605363545245]
Transformer-based object detectors (DETR) have shown significant performance across machine vision tasks. The critical issue to be addressed is how this model architecture can handle different image nuisances. We studied this issue by measuring the performance of DETR with different experiments and benchmarking the network.
arXiv Detail & Related papers (2023-10-12T23:38:52Z)
Visual inspection for illicit items in X-ray images using Deep Learning [7.350725076596881]
Automated detection of contraband items in X-ray images can significantly increase public safety. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task.
arXiv Detail & Related papers (2023-10-05T16:35:27Z)
AiAReSeg: Catheter Detection and Segmentation in Interventional Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature. This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z)
Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z)
Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images. ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present. Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z)
Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS) The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture. It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z)
Simulation-Driven Training of Vision Transformers Enabling Metal Segmentation in X-Ray Images [6.416928579907334]
This study proposes to generate simulated X-ray images based on CT data sets combined with computer aided design (CAD) implants. The metal segmentation in CBCT projections serves as a prerequisite for metal artifact avoidance and reduction algorithms. Our study indicates that the CAD model-based data generation has high flexibility and could be a way to overcome the problem of shortage in clinical data sampling and labelling.
arXiv Detail & Related papers (2022-03-17T09:58:58Z)
On the impact of using X-ray energy response imagery for object detection via Convolutional Neural Networks [17.639472693362926]
We study the impact of variant X-ray imagery, i.e. X-ray energy response (high, low) and effective-z compared to geometries. We evaluate CNN architectures to explore the transferability of models trained with such 'raw' variant imagery.
arXiv Detail & Related papers (2021-08-27T21:28:28Z)
Vision Transformers are Robust Learners [65.91359312429147]
We study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners.
arXiv Detail & Related papers (2021-05-17T02:39:22Z)
Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD) It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.