Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation
- URL: http://arxiv.org/abs/2412.05296v1
- Date: Sun, 24 Nov 2024 16:04:03 GMT
- Title: Revisiting Your Memory: Reconstruction of Affect-Contextualized Memory via EEG-guided Audiovisual Generation
- Authors: Joonwoo Kwon, Heehwan Wang, Jinwoo Lee, Sooyoung Kim, Shinjae Yoo, Yuewei Lin, Jiook Cha,
- Abstract summary: We introduce RecallAffectiveMemory, a novel task designed to reconstruct autobiographical memories through audio-visual generation guided by affect extracted from electroencephalogram (EEG) signals.<n>We present the EEG-AffectiveMemory dataset, which encompasses textual descriptions, visuals, music, and EEG recordings collected during memory recall from nine participants.
- Score: 7.67506894657724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we introduce RecallAffectiveMemory, a novel task designed to reconstruct autobiographical memories through audio-visual generation guided by affect extracted from electroencephalogram (EEG) signals. To support this pioneering task, we present the EEG-AffectiveMemory dataset, which encompasses textual descriptions, visuals, music, and EEG recordings collected during memory recall from nine participants. Furthermore, we propose RYM (Recall Your Memory), a three-stage framework for generating synchronized audio-visual contents while maintaining dynamic personal memory affect trajectories. Experimental results indicate that our method can faithfully reconstruct affect-contextualized audio-visual memory across all subjects, both qualitatively and quantitatively, with participants reporting strong affective concordance between their recalled memories and the generated content. Our approaches advance affect decoding research and its practical applications in personalized media creation via neural-based affect comprehension.
Related papers
- The AI Hippocampus: How Far are We From Human Memory? [77.04745635827278]
Implicit memory refers to the knowledge embedded within the internal parameters of pre-trained transformers.<n>Explicit memory involves external storage and retrieval components designed to augment model outputs with dynamic, queryable knowledge representations.<n>Agentic memory introduces persistent, temporally extended memory structures within autonomous agents.
arXiv Detail & Related papers (2026-01-14T03:24:08Z) - fMRI2GES: Co-speech Gesture Reconstruction from fMRI Signal with Dual Brain Decoding Alignment [47.45203641583922]
We introduce a novel approach, textbffMRI2GES, that allows training of fMRI-to-gesture reconstruction networks on unpaired data.<n>We show that our proposed method can reconstruct expressive gestures directly from fMRI recordings.
arXiv Detail & Related papers (2025-12-01T02:09:44Z) - A Penny for Your Thoughts: Decoding Speech from Inexpensive Brain Signals [1.621606615628714]
We explore whether neural networks can decode brain activity into speech by mapping EEG recordings to audio representations.<n>Using EEG data recorded as subjects listened to natural speech, we train a model with a contrastive CLIP loss to align EEG-derived embeddings with embeddings from a pre-trained transformer-based speech model.
arXiv Detail & Related papers (2025-10-28T06:02:41Z) - GENESIS: A Generative Model of Episodic-Semantic Interaction [0.40286876168661084]
We introduce the Generative Episodic-Semantic Integration System (GENESIS)<n>GENESIS formalizes memory as the interaction between two limited-capacity generative systems.<n>It provides a principled account of memory as an active, constructive, and resource-bounded process.
arXiv Detail & Related papers (2025-10-17T17:11:13Z) - Latent Structured Hopfield Network for Semantic Association and Retrieval [52.634915010996835]
Episodic memory enables humans to recall past experiences by associating semantic elements such as objects, locations, and time into coherent event representations.<n>We propose the Latent Structured Hopfield Network (LSHN), a framework that integrates continuous Hopfield attractor dynamics into an autoencoder architecture.<n>Unlike traditional Hopfield networks, our model is trained end-to-end with gradient descent, achieving scalable and robust memory retrieval.
arXiv Detail & Related papers (2025-06-02T04:24:36Z) - Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation [52.51005875755718]
We focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse.<n>Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings.<n>Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences.
arXiv Detail & Related papers (2025-05-21T05:29:55Z) - On Memory Construction and Retrieval for Personalized Conversational Agents [69.46887405020186]
We propose SeCom, a method that constructs the memory bank at segment level by introducing a conversation segmentation model.
Experimental results show that SeCom exhibits a significant performance advantage over baselines on long-term conversation benchmarks LOCOMO and Long-MT-Bench+.
arXiv Detail & Related papers (2025-02-08T14:28:36Z) - Decoding the Echoes of Vision from fMRI: Memory Disentangling for Past Semantic Information [26.837527605133]
This study investigates the capacity of working memory to retain past information under continuous visual stimuli.
We propose a new task Memory Disentangling, which aims to extract and decode past information from fMRI signals.
arXiv Detail & Related papers (2024-09-30T15:51:06Z) - Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI [20.432212333539628]
We introduce a novel coarse-to-fine audio reconstruction method based on functional Magnetic Resonance Imaging (fMRI) data.
We validate our method on three public fMRI datasets-Brain2Sound, Brain2Music, and Brain2Speech.
By employing semantic prompts during decoding, we enhance the quality of reconstructed audio when semantic features are suboptimal.
arXiv Detail & Related papers (2024-05-29T03:16:14Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - In-Memory Learning: A Declarative Learning Framework for Large Language
Models [56.62616975119192]
We propose a novel learning framework that allows agents to align with their environment without relying on human-labeled data.
This entire process transpires within the memory components and is implemented through natural language.
We demonstrate the effectiveness of our framework and provide insights into this problem.
arXiv Detail & Related papers (2024-03-05T08:25:11Z) - BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction [7.512223286737468]
Analyzing and reconstructing visual stimuli from brain signals effectively advances the understanding of human visual system.
However, the EEG signals are complex and contain significant noise.
This leads to substantial limitations in existing works of visual stimuli reconstruction from EEG.
We propose a novel approach called BrainVis to address these challenges.
arXiv Detail & Related papers (2023-12-22T17:49:11Z) - Cooperative Dual Attention for Audio-Visual Speech Enhancement with
Facial Cues [80.53407593586411]
We focus on leveraging facial cues beyond the lip region for robust Audio-Visual Speech Enhancement (AVSE)
We propose a Dual Attention Cooperative Framework, DualAVSE, to ignore speech-unrelated information, capture speech-related information with facial cues, and dynamically integrate it with the audio signal for AVSE.
arXiv Detail & Related papers (2023-11-24T04:30:31Z) - Saliency-Guided Hidden Associative Replay for Continual Learning [13.551181595881326]
Continual Learning is a burgeoning domain in next-generation AI, focusing on training neural networks over a sequence of tasks akin to human learning.
This paper presents the Saliency Guided Hidden Associative Replay for Continual Learning.
This novel framework synergizes associative memory with replay-based strategies. SHARC primarily archives salient data segments via sparse memory encoding.
arXiv Detail & Related papers (2023-10-06T15:54:12Z) - Memory in Plain Sight: Surveying the Uncanny Resemblances of Associative Memories and Diffusion Models [65.08133391009838]
generative process of Diffusion Models (DMs) has recently set state-of-the-art on many AI generation benchmarks.
We introduce a novel perspective to describe DMs using the mathematical language of memory retrieval from the field of energy-based Associative Memories (AMs)
We present a growing body of evidence that records DMs exhibiting empirical behavior we would expect from AMs, and conclude by discussing research opportunities that are revealed by understanding DMs as a form of energy-based memory.
arXiv Detail & Related papers (2023-09-28T17:57:09Z) - Seeing through the Brain: Image Reconstruction of Visual Perception from
Human Brain Signals [27.92796103924193]
We propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals.
We incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data.
arXiv Detail & Related papers (2023-07-27T12:54:16Z) - Improving Image Recognition by Retrieving from Web-Scale Image-Text Data [68.63453336523318]
We introduce an attention-based memory module, which learns the importance of each retrieved example from the memory.
Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query.
We show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
arXiv Detail & Related papers (2023-04-11T12:12:05Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - Mind Reader: Reconstructing complex images from brain activities [16.78619734818198]
We focus on reconstructing the complex image stimuli from fMRI (functional magnetic resonance imaging) signals.
Unlike previous works that reconstruct images with single objects or simple shapes, our work aims to reconstruct image stimuli rich in semantics.
We find that incorporating an additional text modality is beneficial for the reconstruction problem compared to directly translating brain signals to images.
arXiv Detail & Related papers (2022-09-30T06:32:46Z) - Audio-visual multi-channel speech separation, dereverberation and
recognition [70.34433820322323]
This paper proposes an audio-visual multi-channel speech separation, dereverberation and recognition approach.
The advantage of the additional visual modality over using audio only is demonstrated on two neural dereverberation approaches.
Experiments conducted on the LRS2 dataset suggest that the proposed audio-visual multi-channel speech separation, dereverberation and recognition system outperforms the baseline.
arXiv Detail & Related papers (2022-04-05T04:16:03Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Memory Controlled Sequential Self Attention for Sound Recognition [20.019643319467153]
We propose to use a memory controlled sequential self attention mechanism on top of a convolutional recurrent neural network (CRNN) model for polyphonic sound event detection (SED)
We show that our memory controlled self attention model achieves an event based F -score of 33.92% on the URBAN-SED dataset, outperforming the F -score of 20.10% reported by the model without self attention.
arXiv Detail & Related papers (2020-05-13T22:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.