MMCL: Boosting Deformable DETR-Based Detectors with Multi-Class Min-Margin Contrastive Learning for Superior Prohibited Item Detection
- URL: http://arxiv.org/abs/2406.03176v1
- Date: Wed, 5 Jun 2024 12:07:58 GMT
- Title: MMCL: Boosting Deformable DETR-Based Detectors with Multi-Class Min-Margin Contrastive Learning for Superior Prohibited Item Detection
- Authors: Mingyuan Li, Tong Jia, Hui Lu, Bowen Ma, Hao Wang, Dongyue Chen,
- Abstract summary: Prohibited Item detection in X-ray images is one of the most effective security inspection methods.
overlapping unique phenomena in X-ray images lead to the coupling of foreground and background features.
We propose a Multi-Class Min-Margin Contrastive Learning (MMCL) method to clarify the category semantic information of content queries.
- Score: 8.23801404004195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prohibited Item detection in X-ray images is one of the most effective security inspection methods.However, differing from natural light images, the unique overlapping phenomena in X-ray images lead to the coupling of foreground and background features, thereby lowering the accuracy of general object detectors.Therefore, we propose a Multi-Class Min-Margin Contrastive Learning (MMCL) method that, by clarifying the category semantic information of content queries under the deformable DETR architecture, aids the model in extracting specific category foreground information from coupled features.Specifically, after grouping content queries by the number of categories, we employ the Multi-Class Inter-Class Exclusion (MIE) loss to push apart content queries from different groups. Concurrently, the Intra-Class Min-Margin Clustering (IMC) loss is utilized to attract content queries within the same group, while ensuring the preservation of necessary disparity. As training, the inherent Hungarian matching of the model progressively strengthens the alignment between each group of queries and the semantic features of their corresponding category of objects. This evolving coherence ensures a deep-seated grasp of category characteristics, consequently bolstering the anti-overlapping detection capabilities of models.MMCL is versatile and can be easily plugged into any deformable DETR-based model with dozens of lines of code. Extensive experiments on the PIXray and OPIXray datasets demonstrate that MMCL significantly enhances the performance of various state-of-the-art models without increasing complexity. The code has been released at https://github.com/anonymity0403/MMCL.
Related papers
- CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors [8.23801404004195]
Prohibited item detection based on X-ray images is one of the most effective security inspection methods.
foreground-background feature coupling makes general detectors designed for natural images perform poorly.
We propose a Category Semantic Prior Contrastive Learning mechanism to align the class prototypes perceived by the classifier with the content queries.
arXiv Detail & Related papers (2025-01-28T03:04:22Z) - Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation.
The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module.
Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z) - Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis [1.6916040234975798]
Traditional deep learning methods in medical imaging often focus solely on segmentation or classification.
We propose a simple yet effective UNet-based MTL model, where features extracted by the encoder are used to predict classification labels, while the decoder produces the segmentation mask.
Experimental results across multiple medical datasets confirm the superior performance of our model in both segmentation and classification tasks.
arXiv Detail & Related papers (2024-11-30T04:20:05Z) - Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes.
Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes.
We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD)
By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder.
MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z) - RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition [78.97487780589574]
Multimodal Large Language Models (MLLMs) excel at classifying fine-grained categories.
This paper introduces a Retrieving And Ranking augmented method for MLLMs.
Our proposed approach not only addresses the inherent limitations in fine-grained recognition but also preserves the model's comprehensive knowledge base.
arXiv Detail & Related papers (2024-03-20T17:59:55Z) - Semantic-Aware Dual Contrastive Learning for Multi-label Image
Classification [8.387933969327852]
We propose a novel semantic-aware dual contrastive learning framework that incorporates sample-to-sample contrastive learning.
Specifically, we leverage semantic-aware representation learning to extract category-related local discriminative features.
Our proposed method is effective and outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-07-19T01:57:31Z) - Multiplex-detection Based Multiple Instance Learning Network for Whole
Slide Image Classification [2.61155594652503]
Multiple instance learning (MIL) is a powerful approach to classify whole slide images (WSIs) for diagnostic pathology.
We propose a novel multiplex-detection-based multiple instance learning (MDMIL) to tackle the issues above.
Specifically, MDMIL is constructed by the internal query generation module (IQGM) and the multiplex detection module (MDM)
arXiv Detail & Related papers (2022-08-06T14:36:48Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.