MSConv: Multiplicative and Subtractive Convolution for Face Recognition
- URL: http://arxiv.org/abs/2503.06187v1
- Date: Sat, 08 Mar 2025 12:18:29 GMT
- Title: MSConv: Multiplicative and Subtractive Convolution for Face Recognition
- Authors: Si Zhou, Yain-Whar Si, Xiaochen Yuan, Xiaofan Li, Xiaoxiang Liu, Xinyuan Zhang, Cong Lin, Xueyuan Gong,
- Abstract summary: We propose an efficient convolution module called MSConv (Multiplicative and Subtractive Convolution)<n>Specifically, we employ multi-scale mixed convolution to capture both local and broader contextual information from face images.<n> Experimental results demonstrate that by integrating both salient and differential features, MSConv outperforms models that only focus on salient features.
- Score: 7.230136103375249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Neural Networks, there are various methods of feature fusion. Different strategies can significantly affect the effectiveness of feature representation, consequently influencing the ability of model to extract representative and discriminative features. In the field of face recognition, traditional feature fusion methods include feature concatenation and feature addition. Recently, various attention mechanism-based fusion strategies have emerged. However, we found that these methods primarily focus on the important features in the image, referred to as salient features in this paper, while neglecting another equally important set of features for image recognition tasks, which we term differential features. This may cause the model to overlook critical local differences when dealing with complex facial samples. Therefore, in this paper, we propose an efficient convolution module called MSConv (Multiplicative and Subtractive Convolution), designed to balance the learning of model about salient and differential features. Specifically, we employ multi-scale mixed convolution to capture both local and broader contextual information from face images, and then utilize Multiplication Operation (MO) and Subtraction Operation (SO) to extract salient and differential features, respectively. Experimental results demonstrate that by integrating both salient and differential features, MSConv outperforms models that only focus on salient features.
Related papers
- Prototype-Driven Multi-Feature Generation for Visible-Infrared Person Re-identification [11.664820595258988]
Primary challenges in visible-infrared person re-identification arise from the differences between visible (vis) and infrared (ir) images.
Existing methods often rely on horizontal partitioning to align part-level features, which can introduce inaccuracies.
We propose a novel Prototype-Driven Multi-feature generation framework (PDM) aimed at mitigating cross-modal discrepancies.
arXiv Detail & Related papers (2024-09-09T14:12:23Z) - Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise.
MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains.
Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z) - Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition [57.74076383449153]
We propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++.
It models two common event representations simultaneously, i.e., event images and event voxels.
We achieve new state-of-the-art performance on the Bullying10k dataset, i.e., $90.51%$, which exceeds the second place by $+2.21%$.
arXiv Detail & Related papers (2024-06-27T02:32:46Z) - High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning [54.86882315023791]
We propose an innovative approach called High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning (HDAFL)
HDAFL utilizes multiple convolutional kernels to automatically learn discriminative regions highly correlated with attributes in images.
We also introduce a Transformer-based attribute discrimination encoder to enhance the discriminative capability among attributes.
arXiv Detail & Related papers (2024-04-07T13:17:47Z) - Learning Diversified Feature Representations for Facial Expression
Recognition in the Wild [97.14064057840089]
We propose a mechanism to diversify the features extracted by CNN layers of state-of-the-art facial expression recognition architectures.
Experimental results on three well-known facial expression recognition in-the-wild datasets, AffectNet, FER+, and RAF-DB, show the effectiveness of our method.
arXiv Detail & Related papers (2022-10-17T19:25:28Z) - Deep Collaborative Multi-Modal Learning for Unsupervised Kinship
Estimation [53.62256887837659]
Kinship verification is a long-standing research challenge in computer vision.
We propose a novel deep collaborative multi-modal learning (DCML) to integrate the underlying information presented in facial properties.
Our DCML method is always superior to some state-of-the-art kinship verification methods.
arXiv Detail & Related papers (2021-09-07T01:34:51Z) - Exploring Modality-shared Appearance Features and Modality-invariant
Relation Features for Cross-modality Person Re-Identification [72.95858515157603]
Cross-modality person re-identification works rely on discriminative modality-shared features.
Despite some initial success, such modality-shared appearance features cannot capture enough modality-invariant information.
A novel cross-modality quadruplet loss is proposed to further reduce the cross-modality variations.
arXiv Detail & Related papers (2021-04-23T11:14:07Z) - Feature Decomposition and Reconstruction Learning for Effective Facial
Expression Recognition [80.17419621762866]
We propose a novel Feature Decomposition and Reconstruction Learning (FDRL) method for effective facial expression recognition.
FDRL consists of two crucial networks: a Feature Decomposition Network (FDN) and a Feature Reconstruction Network (FRN)
arXiv Detail & Related papers (2021-04-12T02:22:45Z) - HAMIL: Hierarchical Aggregation-Based Multi-Instance Learning for
Microscopy Image Classification [4.566276053984716]
Multi-instance learning is common for computer vision tasks, especially in biomedical image processing.
In this study, we propose a hierarchical aggregation network for multi-instance learning, called HAMIL.
The hierarchical aggregation protocol enables feature fusion in a defined order, and the simple convolutional aggregation units lead to an efficient and flexible architecture.
arXiv Detail & Related papers (2021-03-17T16:34:08Z) - Image super-resolution reconstruction based on attention mechanism and
feature fusion [3.42658286826597]
A network structure based on attention mechanism and multi-scale feature fusion is proposed.
Experimental results show that the proposed method can achieve better performance over other representative super-resolution reconstruction algorithms.
arXiv Detail & Related papers (2020-04-08T11:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.