Related papers: Operationalizing Convolutional Neural Network Architectures for Prohibited Object Detection in X-Ray Imagery

Operationalizing Convolutional Neural Network Architectures for Prohibited Object Detection in X-Ray Imagery

URL: http://arxiv.org/abs/2110.04906v1
Date: Sun, 10 Oct 2021 21:20:04 GMT
Title: Operationalizing Convolutional Neural Network Architectures for Prohibited Object Detection in X-Ray Imagery
Authors: Thomas W. Webb, Neelanjan Bhowmik, Yona Falinie A. Gaus, Toby P. Breckon
Abstract summary: We explore the viability of two recent end-to-end object detection CNN architectures, Cascade R-CNN and FreeAnchor, for prohibited item detection. With fewer parameters and less training time, FreeAnchor achieves the highest detection inference speed of 13 fps (3.9 ms per image) The CNN models display substantial resilience to the lossy compression, resulting in only a 1.1% decrease in mAP at the JPEG compression level of 50.
Score: 15.694880385913534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent advancement in deep Convolutional Neural Network (CNN) has brought insight into the automation of X-ray security screening for aviation security and beyond. Here, we explore the viability of two recent end-to-end object detection CNN architectures, Cascade R-CNN and FreeAnchor, for prohibited item detection by balancing processing time and the impact of image data compression from an operational viewpoint. Overall, we achieve maximal detection performance using a FreeAnchor architecture with a ResNet50 backbone, obtaining mean Average Precision (mAP) of 87.7 and 85.8 for using the OPIXray and SIXray benchmark datasets, showing superior performance over prior work on both. With fewer parameters and less training time, FreeAnchor achieves the highest detection inference speed of ~13 fps (3.9 ms per image). Furthermore, we evaluate the impact of lossy image compression upon detector performance. The CNN models display substantial resilience to the lossy compression, resulting in only a 1.1% decrease in mAP at the JPEG compression level of 50. Additionally, a thorough evaluation of data augmentation techniques is provided, including adaptions of MixUp and CutMix strategy as well as other standard transformations, further improving the detection accuracy.

Related papers

You Sense Only Once Beneath: Ultra-Light Real-Time Underwater Object Detection [2.5249064981269296]
We propose an Ultra-Light Real-Time Underwater Object Detection framework, You Sense Only Once Beneath (YSOOB) Specifically, we utilize a Multi-Spectrum Wavelet (MSWE) to perform frequency-domain encoding on the input image, minimizing the semantic loss caused by underwater optical color distortion. We also eliminate model redundancy through a simple yet effective channel compression and reconstructed large kernel convolution (RLKC) to achieve model lightweight.
arXiv Detail & Related papers (2025-04-22T08:26:35Z)
SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models [52.40011613324083]
Joint source-channel coding systems (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. We propose SING, a novel framework that formulates the recovery of high-quality images from corrupted reconstructions as an inverse problem.
arXiv Detail & Related papers (2025-03-16T12:32:11Z)
Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression [58.618625678054826]
This study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
arXiv Detail & Related papers (2024-01-25T08:11:27Z)
DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features. Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z)
Learning Heavily-Degraded Prior for Underwater Object Detection [59.5084433933765]
This paper seeks transferable prior knowledge from detector-friendly images. It is based on statistical observations that, the heavily degraded regions of detector-friendly (DFUI) and underwater images have evident feature distribution gaps. Our method with higher speeds and less parameters still performs better than transformer-based detectors.
arXiv Detail & Related papers (2023-08-24T12:32:46Z)
Attention-based Feature Compression for CNN Inference Offloading in Edge Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems. We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device. Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z)
From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks [82.21746840893658]
This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network. We show that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary.
arXiv Detail & Related papers (2022-04-14T15:14:08Z)
EResFD: Rediscovery of the Effectiveness of Standard Convolution for Lightweight Face Detection [13.357235715178584]
We re-examine the effectiveness of the standard convolutional block as a lightweight backbone architecture for face detection. We show that heavily channel-pruned standard convolution layers can achieve better accuracy and inference speed. Our proposed detector EResFD obtained 80.4% mAP on WIDER FACE Hard subset which only takes 37.7 ms for VGA image inference on CPU.
arXiv Detail & Related papers (2022-04-04T02:30:43Z)
NeighCNN: A CNN based SAR Speckle Reduction using Feature preserving Loss Function [1.7188280334580193]
NeighCNN is a deep learning-based speckle reduction algorithm that handles multiplicative noise. Various synthetic, as well as real SAR images, are used for testing the NeighCNN architecture.
arXiv Detail & Related papers (2021-08-26T04:20:07Z)
Efficient CNN-LSTM based Image Captioning using Neural Network Compression [0.0]
We present an unconventional end to end compression pipeline of a CNN-LSTM based Image Captioning model. We then examine the effects of different compression architectures on the model and design a compression architecture that achieves a 73.1% reduction in model size.
arXiv Detail & Related papers (2020-12-17T16:25:09Z)
Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution [10.960291115491504]
We generate an artifact-free high-resolution image from a low-resolution one compressed with an arbitrary quality factor. A context-aware joint CAR and SR neural network (CAJNN) integrates both local and non-local features to solve CAR and SR in one-stage. A deep reconstruction network is adopted to predict high quality and high-resolution images.
arXiv Detail & Related papers (2020-10-18T04:17:08Z)
On the Impact of Lossy Image and Video Compression on the Performance of Deep Convolutional Neural Network Architectures [17.349420462716886]
This study investigates the impact of commonplace image and video compression techniques on the performance of deep learning architectures. We examine the impact on performance across five discrete tasks: human pose estimation, semantic segmentation, object detection, action recognition, and monocular depth estimation. Results show a non-linear and non-uniform relationship between network performance and the level of lossy compression applied.
arXiv Detail & Related papers (2020-07-28T15:37:37Z)
Perceptually Optimizing Deep Image Compression [53.705543593594285]
Mean squared error (MSE) and $ell_p$ norms have largely dominated the measurement of loss in neural networks. We propose a different proxy approach to optimize image analysis networks against quantitative perceptual models.
arXiv Detail & Related papers (2020-07-03T14:33:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.