X-ray illicit object detection using hybrid CNN-transformer neural network architectures
- URL: http://arxiv.org/abs/2505.00564v1
- Date: Thu, 01 May 2025 14:40:38 GMT
- Title: X-ray illicit object detection using hybrid CNN-transformer neural network architectures
- Authors: Jorgen Cani, Christos Diou, Spyridon Evangelatos, Panagiotis Radoglou-Grammatikis, Vasileios Argyriou, Panagiotis Sarigiannidis, Iraklis Varlamis, Georgios Th. Papadopoulos,
- Abstract summary: In X-ray security imaging the literature has been dominated by the use of CNN-based methods.<n>Various hybrid CNN-transformer architectures are evaluated against a common CNN object detection baseline, namely YOLOv8.<n>The resulting architectures are comparatively evaluated on three challenging public X-ray inspection datasets.
- Score: 9.33554429903529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of X-ray security applications, even the smallest details can significantly impact outcomes. Objects that are heavily occluded or intentionally concealed pose a great challenge for detection, whether by human observation or through advanced technological applications. While certain Deep Learning (DL) architectures demonstrate strong performance in processing local information, such as Convolutional Neural Networks (CNNs), others excel in handling distant information, e.g., transformers. In X-ray security imaging the literature has been dominated by the use of CNN-based methods, while the integration of the two aforementioned leading architectures has not been sufficiently explored. In this paper, various hybrid CNN-transformer architectures are evaluated against a common CNN object detection baseline, namely YOLOv8. In particular, a CNN (HGNetV2) and a hybrid CNN-transformer (Next-ViT-S) backbone are combined with different CNN/transformer detection heads (YOLOv8 and RT-DETR). The resulting architectures are comparatively evaluated on three challenging public X-ray inspection datasets, namely EDS, HiXray, and PIDray. Interestingly, while the YOLOv8 detector with its default backbone (CSP-DarkNet53) is generally shown to be advantageous on the HiXray and PIDray datasets, when a domain distribution shift is incorporated in the X-ray images (as happens in the EDS datasets), hybrid CNN-transformer architectures exhibit increased robustness. Detailed comparative evaluation results, including object-level detection performance and object-size error analysis, demonstrate the strengths and weaknesses of each architectural combination and suggest guidelines for future research. The source code and network weights of the models employed in this study are available at https://github.com/jgenc/xray-comparative-evaluation.
Related papers
- Real-Time Anomaly Detection in Video Streams [0.0]
This thesis is part of a CIFRE agreement between the company Othello and the LIASD laboratory.<n>The objective is to develop an artificial intelligence system that can detect real-time dangers in a video stream.
arXiv Detail & Related papers (2024-11-29T14:24:33Z) - Illicit object detection in X-ray images using Vision Transformers [6.728794938150435]
Illicit object detection is a critical task performed at various high-security locations.
This study utilizes both Transformer and hybrid backbones, such as SWIN and NextViT, and detectors, such as DINO and RT-DETR.
arXiv Detail & Related papers (2024-03-27T22:36:02Z) - Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement.<n>In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z) - DLSIA: Deep Learning for Scientific Image Analysis [45.81637398863868]
DLSIA is a Python-based machine learning library that empowers scientists and researchers across diverse scientific domains with a range of customizable convolutional neural network (CNN) architectures.
DLSIA features easy-to-use architectures such as autoencoders, tunable U-Nets, and parameter-lean mixed-scale dense networks (MSDNets)
arXiv Detail & Related papers (2023-08-02T21:32:41Z) - A heterogeneous group CNN for image super-resolution [127.2132400582117]
Convolutional neural networks (CNNs) have obtained remarkable performance via deep architectures.
We present a heterogeneous group SR CNN (HGSRCNN) via leveraging structure information of different types to obtain a high-quality image.
arXiv Detail & Related papers (2022-09-26T04:14:59Z) - Self-Supervised Masked Convolutional Transformer Block for Anomaly
Detection [122.4894940892536]
We present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level.
In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss.
arXiv Detail & Related papers (2022-09-25T04:56:10Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - A Novel Hand Gesture Detection and Recognition system based on
ensemble-based Convolutional Neural Network [3.5665681694253903]
Detection of hand portion has become a challenging task in computer vision and pattern recognition communities.
Deep learning algorithm like convolutional neural network (CNN) architecture has become a very popular choice for classification tasks.
In this paper, an ensemble of CNN-based approaches is presented to overcome some problems like high variance during prediction, overfitting problem and also prediction errors.
arXiv Detail & Related papers (2022-02-25T06:46:58Z) - Classification of diffraction patterns using a convolutional neural
network in single particle imaging experiments performed at X-ray
free-electron lasers [53.65540150901678]
Single particle imaging (SPI) at X-ray free electron lasers (XFELs) is particularly well suited to determine the 3D structure of particles in their native environment.
For a successful reconstruction, diffraction patterns originating from a single hit must be isolated from a large number of acquired patterns.
We propose to formulate this task as an image classification problem and solve it using convolutional neural network (CNN) architectures.
arXiv Detail & Related papers (2021-12-16T17:03:14Z) - Neural Architecture Dilation for Adversarial Robustness [56.18555072877193]
A shortcoming of convolutional neural networks is that they are vulnerable to adversarial attacks.
This paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy.
Under a minimal computational overhead, a dilation architecture is expected to be friendly with the standard performance of the backbone CNN.
arXiv Detail & Related papers (2021-08-16T03:58:00Z) - Receptive Field Regularization Techniques for Audio Classification and
Tagging with Deep Convolutional Neural Networks [7.9495796547433395]
We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization.
We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures.
arXiv Detail & Related papers (2021-05-26T08:36:29Z) - Inferring Convolutional Neural Networks' accuracies from their
architectural characterizations [0.0]
We study the relationships between a CNN's architecture and its performance.
We show that the attributes can be predictive of the networks' performance in two specific computer vision-based physics problems.
We use machine learning models to predict whether a network can perform better than a certain threshold accuracy before training.
arXiv Detail & Related papers (2020-01-07T16:41:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.