MIA-Mind: A Multidimensional Interactive Attention Mechanism Based on MindSpore
- URL: http://arxiv.org/abs/2504.19080v1
- Date: Sun, 27 Apr 2025 02:27:50 GMT
- Title: MIA-Mind: A Multidimensional Interactive Attention Mechanism Based on MindSpore
- Authors: Zhenkai Qin, Jiaquan Liang, Qiao Fang,
- Abstract summary: We propose MIA-Mind, a lightweight and modular Multidimensional Interactive Attention Mechanism.<n> MIA-Mind jointly models spatial and channel features through a unified cross-attentive fusion strategy.<n>Experiments are conducted on three representative datasets.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Attention mechanisms have significantly advanced deep learning by enhancing feature representation through selective focus. However, existing approaches often independently model channel importance and spatial saliency, overlooking their inherent interdependence and limiting their effectiveness. To address this limitation, we propose MIA-Mind, a lightweight and modular Multidimensional Interactive Attention Mechanism, built upon the MindSpore framework. MIA-Mind jointly models spatial and channel features through a unified cross-attentive fusion strategy, enabling fine-grained feature recalibration with minimal computational overhead. Extensive experiments are conducted on three representative datasets: on CIFAR-10, MIA-Mind achieves an accuracy of 82.9\%; on ISBI2012, it achieves an accuracy of 78.7\%; and on CIC-IDS2017, it achieves an accuracy of 91.9\%. These results validate the versatility, lightweight design, and generalization ability of MIA-Mind across heterogeneous tasks. Future work will explore the extension of MIA-Mind to large-scale datasets, the development of ada,ptive attention fusion strategies, and distributed deployment to further enhance scalability and robustness.
Related papers
- MSCA-Net:Multi-Scale Context Aggregation Network for Infrared Small Target Detection [0.0]
This paper proposes a novel network architecture named MSCA-Net, which integrates three key components.<n>MSEDA employs a multi-scale feature fusion attention mechanism to adaptively aggregate information across different scales.<n>PCBAM captures the correlation between global and local features through a correlation matrix-based strategy.
arXiv Detail & Related papers (2025-03-21T14:42:31Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion [14.14051929942914]
We argue that long-range contextual semantic information should be extracted in the feature disentanglement stage and the inter-modal semantic information consistency should be maximized in the feature fusion stage.
Inspired by recent State Space Models (SSMs), we propose a Broad Mamba, which does not rely on a self-attention mechanism for sequence modeling.
We show that the proposed method can overcome the computational and memory limitations of Transformer when modeling long-distance contexts.
arXiv Detail & Related papers (2024-04-27T10:22:03Z) - AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation [4.618389486337933]
We propose AMMUNet, a UNet-based framework that employs multi-scale attention map merging.
The proposed AMMM effectively combines multi-scale attention maps into a unified representation using a fixed mask template.
We show that our approach achieves remarkable mean intersection over union (mIoU) scores of 75.48% on the Vaihingen dataset and an exceptional 77.90% on the Potsdam dataset.
arXiv Detail & Related papers (2024-04-20T15:23:15Z) - Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning [50.73666458313015]
Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications.
MoE has been emerged as a promising solution with its sparse architecture for effective task decoupling.
Intuition-MoR1E achieves superior efficiency and 2.15% overall accuracy improvement across 14 public datasets.
arXiv Detail & Related papers (2024-04-13T12:14:58Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z) - Interpreting and Improving Attention From the Perspective of Large Kernel Convolution [51.06461246235176]
We introduce Large Kernel Convolutional Attention (LKCA), a novel formulation that reinterprets attention operations as a single large- Kernel convolution.<n>LKCA achieves competitive performance across various visual tasks, particularly in data-constrained settings.
arXiv Detail & Related papers (2024-01-11T08:40:35Z) - Spatial Attention-based Distribution Integration Network for Human Pose
Estimation [0.8052382324386398]
We present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization.
Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM)
Our model obtained a remarkable $92.10%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
arXiv Detail & Related papers (2023-11-09T12:43:01Z) - Tool-Augmented Reward Modeling [58.381678612409]
We propose a tool-augmented preference modeling approach, named Themis, to address limitations by empowering RMs with access to external environments.
Our study delves into the integration of external tools into RMs, enabling them to interact with diverse external sources.
In human evaluations, RLHF trained with Themis attains an average win rate of 32% when compared to baselines.
arXiv Detail & Related papers (2023-10-02T09:47:40Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence.
We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN)
We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z) - Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline
Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms.
Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z) - A New Entity Extraction Method Based on Machine Reading Comprehension [4.92025078254413]
This paper presents an effective MRC-based entity extraction model-MRC-I2DP.
It uses the proposed gated attention-attracting mechanism to adjust the restoration of each part of the text pair.
It also uses the proposed 2D probability coding module, TALU function and mask mechanism to strengthen the detection of all possible targets of the target.
arXiv Detail & Related papers (2021-08-14T02:11:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.