Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics
- URL: http://arxiv.org/abs/2510.20068v1
- Date: Wed, 22 Oct 2025 22:47:15 GMT
- Title: Coupled Transformer Autoencoder for Disentangling Multi-Region Neural Latent Dynamics
- Authors: Ram Dyuthi Sristi, Sowmya Manojna Narasimha, Jingya Huang, Alice Despatin, Simon Musall, Vikash Gilja, Gal Mishne,
- Abstract summary: Simultaneous recordings from thousands of neurons across multiple brain areas reveal rich mixtures of activity that are shared between regions and dynamics that are unique to each region.<n>We introduce the Coupled Transformer Autoencoder (CTAE) - a sequence model that addresses both (i) non-stationary, non-linear dynamics and (ii) separation of shared versus region-specific structure in a single framework.<n>CTAE employs transformer encoders and decoders to capture long-range neural dynamics and explicitly partitions each region's latent space into shared and private subspaces.
- Score: 8.294287754474894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simultaneous recordings from thousands of neurons across multiple brain areas reveal rich mixtures of activity that are shared between regions and dynamics that are unique to each region. Existing alignment or multi-view methods neglect temporal structure, whereas dynamical latent variable models capture temporal dependencies but are usually restricted to a single area, assume linear read-outs, or conflate shared and private signals. We introduce the Coupled Transformer Autoencoder (CTAE) - a sequence model that addresses both (i) non-stationary, non-linear dynamics and (ii) separation of shared versus region-specific structure in a single framework. CTAE employs transformer encoders and decoders to capture long-range neural dynamics and explicitly partitions each region's latent space into orthogonal shared and private subspaces. We demonstrate the effectiveness of CTAE on two high-density electrophysiology datasets with simultaneous recordings from multiple regions, one from motor cortical areas and the other from sensory areas. CTAE extracts meaningful representations that better decode behavioral variables compared to existing approaches.
Related papers
- Functional embeddings enable Aggregation of multi-area SEEG recordings over subjects and sessions [0.11083289076967894]
We propose a representation-learning framework that learns a subject-agnostic functional identity for each electrode from multi-region local field potentials.<n>We evaluate this framework on a 20-subject dataset spanning basal ganglia-thalamic regions collected during flexible rest/movement recording sessions.
arXiv Detail & Related papers (2025-10-31T01:23:05Z) - Disentangling Shared and Private Neural Dynamics with SPIRE: A Latent Modeling Framework for Deep Brain Stimulation [0.1259953341639576]
SPIRE is a deep multi-encoder autoencoder that factorizes recordings into shared and private latent subspaces.<n>It robustly recovers cross-regional structure and reveals how externals reorganize it.<n>It is applied to intracranial deep brain stimulation (DBS) recordings.
arXiv Detail & Related papers (2025-10-28T22:45:52Z) - Watch Where You Move: Region-aware Dynamic Aggregation and Excitation for Gait Recognition [55.52723195212868]
GaitRDAE is a framework that automatically searches for motion regions, assigns adaptive temporal scales and applies corresponding attention.<n> Experimental results show that GaitRDAE achieves state-of-the-art performance on several benchmark datasets.
arXiv Detail & Related papers (2025-10-18T15:36:08Z) - Complementary and Contrastive Learning for Audio-Visual Segmentation [74.11434759171199]
We present Complementary and Contrastive Transformer (CCFormer), a novel framework adept at processing both local and global information.<n>Our method sets new state-of-the-art benchmarks across the S4, MS3 and AVSS datasets.
arXiv Detail & Related papers (2025-10-11T06:36:59Z) - DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion [10.936858717759156]
We introduce DynaMind, a novel framework that reconstructs video by jointly modeling neural dynamics and semantic features.<n>On the SEED-DV dataset, DynaMind sets a new state-of-the-art (SOTA), boosting reconstructed video accuracies by 12.5 and 10.3 percentage points.<n>This marks a critical advancement, bridging the gap between neural dynamics and high-fidelity visual semantics.
arXiv Detail & Related papers (2025-09-01T06:52:08Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - Learning Locally Interacting Discrete Dynamical Systems: Towards Data-Efficient and Scalable Prediction [16.972017028598597]
Local dynamical systems exhibit complex global dynamics from local, relatively simple, and often interactions between dynamic elements.
We present Attentive Recurrent Cellular Automata (AR-NCA), to effectively discover unknown local state transition rules.
AR-NCA exhibits the superior generalizability across various system configurations.
arXiv Detail & Related papers (2024-04-09T17:00:43Z) - Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition [38.62221940006509]
Human facial action units (AUs) are mutually related in a hierarchical manner.
AUs located in the same/close facial regions show stronger relationships than those of different facial regions.
This paper proposes a novel multi-scale AU model for occurrence recognition.
arXiv Detail & Related papers (2024-04-09T16:45:34Z) - A Decoupled Spatio-Temporal Framework for Skeleton-based Action
Segmentation [89.86345494602642]
Existing methods are limited in weak-temporal modeling capability.
We propose a Decoupled Scoupled Framework (DeST) to address the issues.
DeST significantly outperforms current state-of-the-art methods with less computational complexity.
arXiv Detail & Related papers (2023-12-10T09:11:39Z) - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth
Estimation in Dynamic Scenes [51.20150148066458]
We propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the generalizationally crafted masks.
Experiments on real-world datasets prove the significant effectiveness and ability of the proposed method.
arXiv Detail & Related papers (2023-04-18T13:55:24Z) - Implicit Neural Spatial Filtering for Multichannel Source Separation in
the Waveform Domain [131.74762114632404]
The model is trained end-to-end and performs spatial processing implicitly.
We evaluate the proposed model on a real-world dataset and show that the model matches the performance of an oracle beamformer.
arXiv Detail & Related papers (2022-06-30T17:13:01Z) - Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel
Transformer [29.03463312813923]
Video denoising aims to recover high-quality frames from the noisy video.
Most existing approaches adopt convolutional neural networks(CNNs) to separate the noise from the original visual content.
We propose a Dual-stage Spatial-Channel Transformer (DSCT) for coarse-to-fine video denoising.
arXiv Detail & Related papers (2022-04-30T09:01:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.