Mamba-CNN: A Hybrid Architecture for Efficient and Accurate Facial Beauty Prediction
- URL: http://arxiv.org/abs/2509.01431v1
- Date: Mon, 01 Sep 2025 12:42:04 GMT
- Title: Mamba-CNN: A Hybrid Architecture for Efficient and Accurate Facial Beauty Prediction
- Authors: Djamel Eddine Boukhari,
- Abstract summary: We propose Mamba-CNN, a novel and efficient hybrid architecture.<n>Mamba-CNN integrates a lightweight, Mamba-inspired State Space Model (SSM) gating mechanism into a hierarchical convolutional backbone.<n>Our findings validate the synergistic potential of combining CNNs with selective SSMs and present a powerful new architectural paradigm for nuanced visual understanding tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The computational assessment of facial attractiveness, a challenging subjective regression task, is dominated by architectures with a critical trade-off: Convolutional Neural Networks (CNNs) offer efficiency but have limited receptive fields, while Vision Transformers (ViTs) model global context at a quadratic computational cost. To address this, we propose Mamba-CNN, a novel and efficient hybrid architecture. Mamba-CNN integrates a lightweight, Mamba-inspired State Space Model (SSM) gating mechanism into a hierarchical convolutional backbone. This core innovation allows the network to dynamically modulate feature maps and selectively emphasize salient facial features and their long-range spatial relationships, mirroring human holistic perception while maintaining computational efficiency. We conducted extensive experiments on the widely-used SCUT-FBP5500 benchmark, where our model sets a new state-of-the-art. Mamba-CNN achieves a Pearson Correlation (PC) of 0.9187, a Mean Absolute Error (MAE) of 0.2022, and a Root Mean Square Error (RMSE) of 0.2610. Our findings validate the synergistic potential of combining CNNs with selective SSMs and present a powerful new architectural paradigm for nuanced visual understanding tasks.
Related papers
- VM-BeautyNet: A Synergistic Ensemble of Vision Transformer and Mamba for Facial Beauty Prediction [0.0]
This paper introduces a novel, heterogeneous ensemble architecture, textbfVM-BeautyNet, that fuses the complementary strengths of a Vision Transformer and a Mamba-based Vision model.<n>Our proposed VM-BeautyNet achieves state-of-the-art performance, with a textbfPearson Correlation (PC) of 0.9212, a textbfMean Absolute Error (MAE) of 0.2085, and a textbfRoot Mean Square Error (RMSE) of 0.2698.
arXiv Detail & Related papers (2025-10-17T21:10:46Z) - SynergyNet: Fusing Generative Priors and State-Space Models for Facial Beauty Prediction [0.0]
This paper introduces the textbfMamba-Diffusion Network (MD-Net), a novel dual-stream architecture for predicting facial beauty.<n>MD-Net sets a new state-of-the-art, achieving a Pearson Correlation of textbf0.9235 and demonstrating the significant potential of hybrid architectures.
arXiv Detail & Related papers (2025-09-21T17:36:42Z) - An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas.<n>We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z) - ALWNN Empowered Automatic Modulation Classification: Conquering Complexity and Scarce Sample Conditions [24.59462798452397]
This paper proposes an automatic modulation classification model based on the Adaptive Lightweight Wavelet Neural Network (ALWNN) and the few-shot framework (MALWNN)<n>The ALWNN model, by integrating the adaptive wavelet neural network and depth separable convolution, reduces the number of model parameters and computational complexity.<n> Experiments with MALWNN show its superior performance in few-shot learning scenarios compared to other algorithms.
arXiv Detail & Related papers (2025-03-24T06:14:33Z) - Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.<n>We reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.<n>Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.<n>Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z) - VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting [11.058879849373572]
ViTs or CNNs with RNNs fortemporal forecasting have unparalleled results in predicting temporal and spatial dynamics.
Recent Mamba-based architecture has been met with enthusiasm for their exceptional long-sequence modeling capabilities.
We propose the VMRNN cell, a recurrent unit that integrates the strengths of Vision Mamba blocks with LSTM.
arXiv Detail & Related papers (2024-03-25T08:26:42Z) - Efficient Deep Spiking Multi-Layer Perceptrons with Multiplication-Free Inference [13.924924047051782]
Deep convolution architectures for Spiking Neural Networks (SNNs) have significantly enhanced image classification performance and reduced computational burdens.
This research explores a new pathway, drawing inspiration from the progress made in Multi-Layer Perceptrons (MLPs)
We propose an innovative spiking architecture that uses batch normalization to retain MFI compatibility.
We establish an efficient multi-stage spiking network that blends effectively global receptive fields with local feature extraction.
arXiv Detail & Related papers (2023-06-21T16:52:20Z) - Spikformer: When Spiking Neural Network Meets Transformer [102.91330530210037]
We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism.
We propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer)
arXiv Detail & Related papers (2022-09-29T14:16:49Z) - A Battle of Network Structures: An Empirical Study of CNN, Transformer,
and MLP [121.35904748477421]
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and Vision-Mixer, started to lead new trends.
In this paper, we conduct empirical studies on these DNN structures and try to understand their respective pros and cons.
arXiv Detail & Related papers (2021-08-30T06:09:02Z) - Neural Architecture Dilation for Adversarial Robustness [56.18555072877193]
A shortcoming of convolutional neural networks is that they are vulnerable to adversarial attacks.
This paper aims to improve the adversarial robustness of the backbone CNNs that have a satisfactory accuracy.
Under a minimal computational overhead, a dilation architecture is expected to be friendly with the standard performance of the backbone CNN.
arXiv Detail & Related papers (2021-08-16T03:58:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.