Decoding Visual Neural Representations by Multimodal with Dynamic Balancing
- URL: http://arxiv.org/abs/2509.03433v1
- Date: Wed, 03 Sep 2025 16:03:59 GMT
- Title: Decoding Visual Neural Representations by Multimodal with Dynamic Balancing
- Authors: Kaili sun, Xingyu Miao, Bing Zhai, Haoran Duan, Yang Long,
- Abstract summary: We propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals.<n>We introduce text modality to enhance the semantic correspondence between EEG signals and visual content.<n>Our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0% and 4.7% respectively.
- Score: 8.355081324607537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose an innovative framework that integrates EEG, image, and text data, aiming to decode visual neural representations from low signal-to-noise ratio EEG signals. Specifically, we introduce text modality to enhance the semantic correspondence between EEG signals and visual content. With the explicit semantic labels provided by text, image and EEG features of the same category can be more closely aligned with the corresponding text representations in a shared multimodal space. To fully utilize pre-trained visual and textual representations, we propose an adapter module that alleviates the instability of high-dimensional representation while facilitating the alignment and fusion of cross-modal features. Additionally, to alleviate the imbalance in multimodal feature contributions introduced by the textual representations, we propose a Modal Consistency Dynamic Balance (MCDB) strategy that dynamically adjusts the contribution weights of each modality. We further propose a stochastic perturbation regularization (SPR) term to enhance the generalization ability of semantic perturbation-based models by introducing dynamic Gaussian noise in the modality optimization process. The evaluation results on the ThingsEEG dataset show that our method surpasses previous state-of-the-art methods in both Top-1 and Top-5 accuracy metrics, improving by 2.0\% and 4.7\% respectively.
Related papers
- Vision-Enhanced Large Language Models for High-Resolution Image Synthesis and Multimodal Data Interpretation [0.0]
This research introduces a transformative framework for integrating Vision-Enhanced Large Language Models (LLMs) with advanced transformer-based architectures.<n>The proposed model incorporates a rectified flow mechanism that connects noise and data with linear paths, enabling efficient and high-quality generation.<n>The framework achieves unparalleled fidelity in synthesized images and coherent multimodal representations.
arXiv Detail & Related papers (2025-12-14T08:28:50Z) - Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection [54.10252086842123]
Multimodal Sentiment Analysis (MSA) aims to predict sentiment from language, acoustic, and visual data in videos.<n>This paper proposes a modality optimization and dynamic primary modality selection framework (MODS)<n>Experiments on four benchmark datasets demonstrate that MODS outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-11-09T11:13:32Z) - IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction [77.06211178777939]
IAR2 is an advanced autoregressive framework that enables a hierarchical semantic-detail synthesis process.<n>We show that IAR2 sets a new state-of-the-art for autoregressive image generation, achieving a FID of 1.50 on ImageNet.
arXiv Detail & Related papers (2025-10-08T12:08:21Z) - EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation [50.848374648774374]
MultiModal Recommendation (MMR) systems have emerged as a promising solution for improving recommendation quality by leveraging rich item-side modality information.<n>We propose EGRA, which incorporates into the behavior graph an item-item graph built from representations generated by a pretrained MMR model.<n>It also introduces a novel bi-level dynamic alignment weighting mechanism to improve modality-behavior representation alignment.
arXiv Detail & Related papers (2025-08-22T07:47:54Z) - Semantic Item Graph Enhancement for Multimodal Recommendation [49.66272783945571]
Multimodal recommendation systems have attracted increasing attention for their improved performance by leveraging items' multimodal information.<n>Prior methods often build modality-specific item-item semantic graphs from raw modality features.<n>These semantic graphs suffer from semantic deficiencies, including insufficient modeling of collaborative signals among items.
arXiv Detail & Related papers (2025-08-08T09:20:50Z) - Modeling and Performance Analysis for Semantic Communications Based on Empirical Results [53.805458017074294]
We propose an Alpha-Beta-Gamma (ABG) formula to model the relationship between the end-to-end measurement and SNR.<n>For image reconstruction tasks, the proposed ABG formula can well fit the commonly used DL networks, such as SCUNet, and Vision Transformer.<n>To the best of our knowledge, this is the first theoretical expression between end-to-end performance metrics and SNR for semantic communications.
arXiv Detail & Related papers (2025-04-29T06:07:50Z) - DualKanbaFormer: An Efficient Selective Sparse Framework for Multimodal Aspect-based Sentiment Analysis [0.6187939267100836]
We introduce DualKanbaFormer, a novel framework that leverages parallel Textual and Visual KanbaFormer modules for robust multimodal analysis.<n>Our approach incorporates Aspect-Driven Sparse Attention (ADSA) to balance coarse-grained aggregation and fine-grained selection for aspect-focused precision.<n>We replace traditional feed-forward networks and normalization with Kolmogorov-Arnold Networks (KANs) and Dynamic Tanh (DyT) to enhance non-linear expressivity and inference stability.
arXiv Detail & Related papers (2024-08-27T19:33:15Z) - Harmonizing Visual Text Comprehension and Generation [31.605599298507293]
We present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text.
We propose Slide-LoRA, which aggregates modality-specific and modality-agnostic LoRA experts, partially decoupling the multimodal generation space.
Comprehensive experiments across various benchmarks demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-07-23T10:11:56Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - Dynamic Visual Semantic Sub-Embeddings and Fast Re-Ranking [0.5242869847419834]
We propose a Dynamic Visual Semantic Sub-Embeddings framework (DVSE) to reduce the information entropy.
To encourage the generated candidate embeddings to capture various semantic variations, we construct a mixed distribution.
We compare the performance with existing set-based method using four image feature encoders and two text feature encoders on three benchmark datasets.
arXiv Detail & Related papers (2023-09-15T04:39:11Z) - Semantic Image Synthesis via Diffusion Models [174.24523061460704]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.<n>Recent work on semantic image synthesis mainly follows the de facto GAN-based approaches.<n>We propose a novel framework based on DDPM for semantic image synthesis.
arXiv Detail & Related papers (2022-06-30T18:31:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.