AVM: Towards Structure-Preserving Neural Response Modeling in the Visual Cortex Across Stimuli and Individuals
- URL: http://arxiv.org/abs/2512.16948v1
- Date: Wed, 17 Dec 2025 07:26:16 GMT
- Title: AVM: Towards Structure-Preserving Neural Response Modeling in the Visual Cortex Across Stimuli and Individuals
- Authors: Qi Xu, Shuai Gong, Xuming Ran, Haihua Luo, Yangfan Hu,
- Abstract summary: We introduce the Adaptive Visual Model (AVM), a structure-preserving framework that enables condition-aware adaptation.<n>AVM keeps a Vision Transformer-based encoder frozen to capture consistent visual features, while independently trained modulation account paths for neural response variations.<n>We evaluate AVM in three experimental settings, including stimulus-level variation, cross-subject generalization, and cross-dataset adaptation.
- Score: 25.302743154898437
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep learning models have shown strong performance in simulating neural responses, they often fail to clearly separate stable visual encoding from condition-specific adaptation, which limits their ability to generalize across stimuli and individuals. We introduce the Adaptive Visual Model (AVM), a structure-preserving framework that enables condition-aware adaptation through modular subnetworks, without modifying the core representation. AVM keeps a Vision Transformer-based encoder frozen to capture consistent visual features, while independently trained modulation paths account for neural response variations driven by stimulus content and subject identity. We evaluate AVM in three experimental settings, including stimulus-level variation, cross-subject generalization, and cross-dataset adaptation, all of which involve structured changes in inputs and individuals. Across two large-scale mouse V1 datasets, AVM outperforms the state-of-the-art V1T model by approximately 2% in predictive correlation, demonstrating robust generalization, interpretable condition-wise modulation, and high architectural efficiency. Specifically, AVM achieves a 9.1% improvement in explained variance (FEVE) under the cross-dataset adaptation setting. These results suggest that AVM provides a unified framework for adaptive neural modeling across biological and experimental conditions, offering a scalable solution under structural constraints. Its design may inform future approaches to cortical modeling in both neuroscience and biologically inspired AI systems.
Related papers
- Modeling Cross-vision Synergy for Unified Large Vision Model [130.37489011094036]
PolyV is a unified large vision model that achieves cross-vision synergy at both the architectural and training levels.<n>PolyV consistently outperforms existing models, achieving over 10% average improvement over its backbone.
arXiv Detail & Related papers (2026-03-03T22:44:43Z) - EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment [68.77813885751308]
EyeSimVQA is a novel VQA framework that incorporates free-energy-based self-repair.<n>We show EyeSimVQA achieves competitive or superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-06-13T08:00:54Z) - DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor [22.35724335601674]
Video Quality Assessment (VQA) aims to evaluate video quality based on perceptual distortions and human preferences.<n>We introduce a novel VQA framework, DiffVQA, which harnesses the robust generalization capabilities of diffusion models pre-trained on extensive datasets.
arXiv Detail & Related papers (2025-05-06T07:42:24Z) - UniViTAR: Unified Vision Transformer with Native Resolution [37.63387029787732]
We introduce UniViTAR, a family of homogeneous vision foundation models tailored for unified visual modality and native resolution scenario.<n>A progressive training paradigm is introduced, which strategically combines two core mechanisms.<n>In parallel, a hybrid training framework further synergizes sigmoid-based contrastive loss with feature distillation from a frozen teacher model.
arXiv Detail & Related papers (2025-04-02T14:59:39Z) - BHViT: Binarized Hybrid Vision Transformer [53.38894971164072]
Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN)<n>We propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations.<n>Our proposed algorithm achieves SOTA performance among binary ViT methods.
arXiv Detail & Related papers (2025-03-04T08:35:01Z) - TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation [65.65530016765615]
We propose a hierarchical predictive coding framework that captures multi-scale dependencies through three complementary learning objectives.<n> TokenUnify integrates random token prediction, next-token prediction, and next-all token prediction to create a comprehensive representational space.<n>We also introduce a large-scale EM dataset with 1.2 billion annotated voxels, offering ideal long-sequence visual data with spatial continuity.
arXiv Detail & Related papers (2024-05-27T05:45:51Z) - Layerwise complexity-matched learning yields an improved model of cortical area V2 [12.861402235256207]
Deep neural networks trained end-to-end for object recognition approach human capabilities.
We develop a self-supervised training methodology that operates independently on successive layers.
We show that our model is better aligned with selectivity properties and neural activity in primate area V2.
arXiv Detail & Related papers (2023-12-18T18:37:02Z) - Multi-modal Gaussian Process Variational Autoencoders for Neural and
Behavioral Data [0.9622208190558754]
We propose an unsupervised latent variable model which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities.
We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time.
We show that the multi-modal GP-VAE is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials.
arXiv Detail & Related papers (2023-10-04T19:04:55Z) - Generalized Face Forgery Detection via Adaptive Learning for Pre-trained Vision Transformer [54.32283739486781]
We present a textbfForgery-aware textbfAdaptive textbfVision textbfTransformer (FA-ViT) under the adaptive learning paradigm.
FA-ViT achieves 93.83% and 78.32% AUC scores on Celeb-DF and DFDC datasets in the cross-dataset evaluation.
arXiv Detail & Related papers (2023-09-20T06:51:11Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - V1T: large-scale mouse V1 response prediction using a Vision Transformer [1.5703073293718952]
We introduce V1T, a novel Vision Transformer based architecture that learns a shared visual and behavioral representation across animals.
We evaluate our model on two large datasets recorded from mouse primary visual cortex and outperform previous convolution-based models by more than 12.7% in prediction performance.
arXiv Detail & Related papers (2023-02-06T18:58:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.