Related papers: MMCL: Boosting Deformable DETR-Based Detectors with Multi-Class Min-Margin Contrastive Learning for Superior Prohibited Item Detection

MMCL: Boosting Deformable DETR-Based Detectors with Multi-Class Min-Margin Contrastive Learning for Superior Prohibited Item Detection

URL: http://arxiv.org/abs/2406.03176v1
Date: Wed, 5 Jun 2024 12:07:58 GMT
Title: MMCL: Boosting Deformable DETR-Based Detectors with Multi-Class Min-Margin Contrastive Learning for Superior Prohibited Item Detection
Authors: Mingyuan Li, Tong Jia, Hui Lu, Bowen Ma, Hao Wang, Dongyue Chen,
Abstract summary: Prohibited Item detection in X-ray images is one of the most effective security inspection methods. overlapping unique phenomena in X-ray images lead to the coupling of foreground and background features. We propose a Multi-Class Min-Margin Contrastive Learning (MMCL) method to clarify the category semantic information of content queries.
Score: 8.23801404004195
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prohibited Item detection in X-ray images is one of the most effective security inspection methods.However, differing from natural light images, the unique overlapping phenomena in X-ray images lead to the coupling of foreground and background features, thereby lowering the accuracy of general object detectors.Therefore, we propose a Multi-Class Min-Margin Contrastive Learning (MMCL) method that, by clarifying the category semantic information of content queries under the deformable DETR architecture, aids the model in extracting specific category foreground information from coupled features.Specifically, after grouping content queries by the number of categories, we employ the Multi-Class Inter-Class Exclusion (MIE) loss to push apart content queries from different groups. Concurrently, the Intra-Class Min-Margin Clustering (IMC) loss is utilized to attract content queries within the same group, while ensuring the preservation of necessary disparity. As training, the inherent Hungarian matching of the model progressively strengthens the alignment between each group of queries and the semantic features of their corresponding category of objects. This evolving coherence ensures a deep-seated grasp of category characteristics, consequently bolstering the anti-overlapping detection capabilities of models.MMCL is versatile and can be easily plugged into any deformable DETR-based model with dozens of lines of code. Extensive experiments on the PIXray and OPIXray datasets demonstrate that MMCL significantly enhances the performance of various state-of-the-art models without increasing complexity. The code has been released at https://github.com/anonymity0403/MMCL.

Related papers

StackCLIP: Clustering-Driven Stacked Prompt in Zero-Shot Industrial Anomaly Detection [5.390045840354081]
We propose a method that transforms category names through multicategory name stacking to create stacked prompts.<n>The Clustering-Driven Stacked Prompts (CSP) module constructs generic prompts by stacking semantically analogous categories.<n>The Ensemble Feature Alignment (EFA) module trains knowledge-specific linear layers tailored for each stack cluster.
arXiv Detail & Related papers (2025-06-30T07:29:10Z)
Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning [58.16354555208417]
PAD and FFD are proposed to protect face data from physical media-based Presentation Attacks and digital editing-based DeepFakes, respectively.<n>The lack of a Unified Face Attack Detection model to simultaneously handle attacks in these two categories is mainly attributed to two factors.<n>We present a novel Visual-Language Model-based Hierarchical Prompt Tuning Framework that adaptively explores multiple classification criteria from different semantic spaces.
arXiv Detail & Related papers (2025-05-19T16:35:45Z)
Partially Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation [53.723234136550055]
We term the new learning paradigm as Partially Supervised Unpaired Multi-Modal Learning (PSUMML) We propose a novel Decomposed partial class adaptation with snapshot Ensembled Self-Training (DEST) framework for it. Our framework consists of a compact segmentation network with modality specific normalization layers for learning with partially labeled unpaired multi-modal data.
arXiv Detail & Related papers (2025-03-07T07:22:42Z)
CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors [8.23801404004195]
Prohibited item detection based on X-ray images is one of the most effective security inspection methods. foreground-background feature coupling makes general detectors designed for natural images perform poorly. We propose a Category Semantic Prior Contrastive Learning mechanism to align the class prototypes perceived by the classifier with the content queries.
arXiv Detail & Related papers (2025-01-28T03:04:22Z)
Category-Adaptive Cross-Modal Semantic Refinement and Transfer for Open-Vocabulary Multi-Label Recognition [59.203152078315235]
We propose a novel category-adaptive cross-modal semantic refinement and transfer (C$2$SRT) framework to explore the semantic correlation. The proposed framework consists of two complementary modules, i.e., intra-category semantic refinement (ISR) module and inter-category semantic transfer (IST) module. Experiments on OV-MLR benchmarks clearly demonstrate that the proposed C$2$SRT framework outperforms current state-of-the-art algorithms.
arXiv Detail & Related papers (2024-12-09T04:00:18Z)
Multi-scale Feature Enhancement in Multi-task Learning for Medical Image Analysis [1.6916040234975798]
Traditional deep learning methods in medical imaging often focus solely on segmentation or classification. We propose a simple yet effective UNet-based MTL model, where features extracted by the encoder are used to predict classification labels, while the decoder produces the segmentation mask. Experimental results across multiple medical datasets confirm the superior performance of our model in both segmentation and classification tasks.
arXiv Detail & Related papers (2024-11-30T04:20:05Z)
Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection [75.02249869573994]
In open-set scenarios, the unlabeled dataset contains both in-distribution (ID) classes and out-of-distribution (OOD) classes. Applying semi-supervised detectors in such settings can lead to misclassifying OOD class as ID classes. We propose a simple yet effective method, termed Collaborative Feature-Logits Detector (CFL-Detector)
arXiv Detail & Related papers (2024-11-20T02:57:35Z)
An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training. Previous research has focused on aligning sequences' visual and semantic spatial distributions. We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z)
Toward Multi-class Anomaly Detection: Exploring Class-aware Unified Model against Inter-class Interference [67.36605226797887]
We introduce a Multi-class Implicit Neural representation Transformer for unified Anomaly Detection (MINT-AD) By learning the multi-class distributions, the model generates class-aware query embeddings for the transformer decoder. MINT-AD can project category and position information into a feature embedding space, further supervised by classification and prior probability loss functions.
arXiv Detail & Related papers (2024-03-21T08:08:31Z)
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition [78.97487780589574]
Multimodal Large Language Models (MLLMs) excel at classifying fine-grained categories. This paper introduces a Retrieving And Ranking augmented method for MLLMs. Our proposed approach not only addresses the inherent limitations in fine-grained recognition but also preserves the model's comprehensive knowledge base.
arXiv Detail & Related papers (2024-03-20T17:59:55Z)
Semantic-Aware Dual Contrastive Learning for Multi-label Image Classification [8.387933969327852]
We propose a novel semantic-aware dual contrastive learning framework that incorporates sample-to-sample contrastive learning. Specifically, we leverage semantic-aware representation learning to extract category-related local discriminative features. Our proposed method is effective and outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-07-19T01:57:31Z)
MHCCL: Masked Hierarchical Cluster-Wise Contrastive Learning for Multivariate Time Series [20.008535430484475]
Masked Hierarchical Cluster-wise Contrastive Learning model is presented. It exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for time series. It is shown to be superior to state-of-the-art approaches for unsupervised time series representation learning.
arXiv Detail & Related papers (2022-12-02T12:42:53Z)
Multiplex-detection Based Multiple Instance Learning Network for Whole Slide Image Classification [2.61155594652503]
Multiple instance learning (MIL) is a powerful approach to classify whole slide images (WSIs) for diagnostic pathology. We propose a novel multiplex-detection-based multiple instance learning (MDMIL) to tackle the issues above. Specifically, MDMIL is constructed by the internal query generation module (IQGM) and the multiplex detection module (MDM)
arXiv Detail & Related papers (2022-08-06T14:36:48Z)
Semantic Representation and Dependency Learning for Multi-Label Image Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category. Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model. We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.