Deep Models, Shallow Alignment: Uncovering the Granularity Mismatch in Neural Decoding
- URL: http://arxiv.org/abs/2601.21948v1
- Date: Thu, 29 Jan 2026 16:30:32 GMT
- Title: Deep Models, Shallow Alignment: Uncovering the Granularity Mismatch in Neural Decoding
- Authors: Yang Du, Siyuan Dai, Yonghao Song, Paul M. Thompson, Haoteng Tang, Liang Zhan,
- Abstract summary: We propose a novel contrastive learning strategy that aligns neural signals with intermediate representations of visual encoders rather than their final outputs.<n>Our approach effectively unlocks the scaling law in neural visual decoding, enabling decoding performance to scale predictably with the capacity of pre-trained vision backbones.
- Score: 8.822848795081693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural visual decoding is a central problem in brain computer interface research, aiming to reconstruct human visual perception and to elucidate the structure of neural representations. However, existing approaches overlook a fundamental granularity mismatch between human and machine vision, where deep vision models emphasize semantic invariance by suppressing local texture information, whereas neural signals preserve an intricate mixture of low-level visual attributes and high-level semantic content. To address this mismatch, we propose Shallow Alignment, a novel contrastive learning strategy that aligns neural signals with intermediate representations of visual encoders rather than their final outputs, thereby striking a better balance between low-level texture details and high-level semantic features. Extensive experiments across multiple benchmarks demonstrate that Shallow Alignment significantly outperforms standard final-layer alignment, with performance gains ranging from 22% to 58% across diverse vision backbones. Notably, our approach effectively unlocks the scaling law in neural visual decoding, enabling decoding performance to scale predictably with the capacity of pre-trained vision backbones. We further conduct systematic empirical analyses to shed light on the mechanisms underlying the observed performance gains.
Related papers
- Learning Brain Representation with Hierarchical Visual Embeddings [30.701493890961284]
We propose a brain-image alignment strategy that leverages pre-trained visual encoders with distinct inductive biases to capture hierarchical and multi-scale visual representations.<n>Our method achieves a favorable balance between retrieval accuracy and reconstruction fidelity.
arXiv Detail & Related papers (2026-02-07T11:14:03Z) - Simple Models, Rich Representations: Visual Decoding from Primate Intracortical Neural Signals [0.0]
We address the problem of decoding visual information from high-density intracortical recordings in primates.<n>We develop a modular generative decoding pipeline that combines low-resolution latent reconstruction with semantically conditioned diffusion.<n>This framework provides principles for brain-computer interfaces and semantic neural decoding.
arXiv Detail & Related papers (2026-01-16T09:10:31Z) - Uncovering Semantic Selectivity of Latent Groups in Higher Visual Cortex with Mutual Information-Guided Diffusion [19.983291706164923]
We present MIG-Vis, a method to visualize and validate the visual-semantic attributes encoded in neural latent subspaces.<n>We validate MIG-Vis on multi-session neural spiking datasets from the inferior temporal (IT) cortex of two macaques.
arXiv Detail & Related papers (2025-10-02T16:33:40Z) - The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet [0.0]
This paper introduces Visual Cortex Network (VCNet), a novel neural network architecture.<n> VCNet is framed as a geometric framework that emulates key biological mechanisms.<n>We show that VCNet achieves state-of-the-art accuracy of 92.1% on Spots-10 and 74.4% on the light field dataset.
arXiv Detail & Related papers (2025-08-05T01:52:42Z) - TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation [65.65530016765615]
We propose a hierarchical predictive coding framework that captures multi-scale dependencies through three complementary learning objectives.<n> TokenUnify integrates random token prediction, next-token prediction, and next-all token prediction to create a comprehensive representational space.<n>We also introduce a large-scale EM dataset with 1.2 billion annotated voxels, offering ideal long-sequence visual data with spatial continuity.
arXiv Detail & Related papers (2024-05-27T05:45:51Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Spiking Neural Networks for Frame-based and Event-based Single Object
Localization [26.51843464087218]
Spiking neural networks have shown much promise as an energy-efficient alternative to artificial neural networks.
We propose a spiking neural network approach for single object localization trained using surrogate gradient descent.
We compare our method with similar artificial neural networks and show that our model has competitive/better performance in accuracy, against various corruptions, and has lower energy consumption.
arXiv Detail & Related papers (2022-06-13T22:22:32Z) - Behind the Machine's Gaze: Biologically Constrained Neural Networks
Exhibit Human-like Visual Attention [40.878963450471026]
We propose the Neural Visual Attention (NeVA) algorithm to generate visual scanpaths in a top-down manner.
We show that the proposed method outperforms state-of-the-art unsupervised human attention models in terms of similarity to human scanpaths.
arXiv Detail & Related papers (2022-04-19T18:57:47Z) - FuNNscope: Visual microscope for interactively exploring the loss
landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks.
We generalize observations on small neural networks to more complex systems.
An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Towards Analysis-friendly Face Representation with Scalable Feature and
Texture Compression [113.30411004622508]
We show that a universal and collaborative visual information representation can be achieved in a hierarchical way.
Based on the strong generative capability of deep neural networks, the gap between the base feature layer and enhancement layer is further filled with the feature level texture reconstruction.
To improve the efficiency of the proposed framework, the base layer neural network is trained in a multi-task manner.
arXiv Detail & Related papers (2020-04-21T14:32:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.