Related papers: Disentangling the Factors of Convergence between Brains and Computer Vision Models

Disentangling the Factors of Convergence between Brains and Computer Vision Models

URL: http://arxiv.org/abs/2508.18226v1
Date: Mon, 25 Aug 2025 17:23:27 GMT
Title: Disentangling the Factors of Convergence between Brains and Computer Vision Models
Authors: Joséphine Raugel, Marc Szafraniec, Huy V. Vo, Camille Couprie, Patrick Labatut, Piotr Bojanowski, Valentin Wyart, Jean-Rémi King,
Abstract summary: We show that the largest DINOv3 models trained with the most human-centric images reach the highest brain-similarity.<n>These findings disentangle the interplay between architecture and experience in shaping how artificial neural networks come to see the world as humans do.
Score: 11.560007214914465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many AI models trained on natural images develop representations that resemble those of the human brain. However, the factors that drive this brain-model similarity remain poorly understood. To disentangle how the model, training and data independently lead a neural network to develop brain-like representations, we trained a family of self-supervised vision transformers (DINOv3) that systematically varied these different factors. We compare their representations of images to those of the human brain recorded with both fMRI and MEG, providing high resolution in spatial and temporal analyses. We assess the brain-model similarity with three complementary metrics focusing on overall representational similarity, topographical organization, and temporal dynamics. We show that all three factors - model size, training amount, and image type - independently and interactively impact each of these brain similarity metrics. In particular, the largest DINOv3 models trained with the most human-centric images reach the highest brain-similarity. This emergence of brain-like representations in AI models follows a specific chronology during training: models first align with the early representations of the sensory cortices, and only align with the late and prefrontal representations of the brain with considerably more training. Finally, this developmental trajectory is indexed by both structural and functional properties of the human cortex: the representations that are acquired last by the models specifically align with the cortical areas with the largest developmental expansion, thickness, least myelination, and slowest timescales. Overall, these findings disentangle the interplay between architecture and experience in shaping how artificial neural networks come to see the world as humans do, thus offering a promising framework to understand how the human brain comes to represent its visual world.

Related papers

Human-level 3D shape perception emerges from multi-view learning [63.048728487674815]
We develop a modeling framework that predicts human 3D shape inferences for arbitrary objects.<n>We achieve this with a novel class of neural networks trained using a visual-spatial objective over naturalistic sensory data.<n>We find that human-level 3D perception can emerge from a simple, scalable learning objective over naturalistic visual-spatial data.
arXiv Detail & Related papers (2026-02-19T18:56:05Z)
A Unified Geometric Space Bridging AI Models and the Human Brain [24.54324712609098]
Modern artificial neural networks now rival humans in language, perception, and reasoning.<n>It is still largely unknown whether these artificial systems organize information as the brain does.<n>Here we introduce a groundbreaking concept of Brain-like Space: a unified geometric space in which every AI model can be precisely situated.
arXiv Detail & Related papers (2025-10-28T12:09:23Z)
Probing the Representational Geometry of Color Qualia: Dissociating Pure Perception from Task Demands in Brains and AI Models [6.165387850279033]
We perform a rigorous comparison of the representational geometry of color qualia between state-of-the-art AI models and the human brain.<n>Our work contributes a new benchmark task for color qualia to the field, packaged in a Brain-Score compatible format.
arXiv Detail & Related papers (2025-10-26T19:13:16Z)
Convergent transformations of visual representation in brains and models [0.0]
A fundamental question in cognitive neuroscience is what shapes visual perception: the external world's structure or the brain's internal architecture.<n>We show a convergent computational solution for visual encoding in both human and artificial vision, driven by the structure of the external world.
arXiv Detail & Related papers (2025-07-18T14:13:54Z)
Voxel-Level Brain States Prediction Using Swin Transformer [65.9194533414066]
We propose a novel architecture which employs a 4D Shifted Window (Swin) Transformer as encoder to efficiently learn-temporal information and a convolutional decoder to enable brain state prediction at the same spatial and temporal resolution as the input fMRI data.<n>Our model has shown high accuracy when predicting 7.2s resting-state brain activities based on the prior 23.04s fMRI time series.<n>This shows promising evidence that thetemporal organization of the human brain can be learned by a Swin Transformer model, at high resolution, which provides a potential for reducing fMRI scan time and the development of brain-computer interfaces
arXiv Detail & Related papers (2025-06-13T04:14:38Z)
Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI) Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli. In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs) This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z)
Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI. Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels. The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z)
Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment [2.035627332992055]
Functional magnetic resonance imaging (fMRI) as a widely used technique in cognitive neuroscience can record neural activation in the human visual cortex during the process of visual perception. This study proposed ReAlnet-fMRI, a model based on the SOTA vision model CORnet but optimized using human fMRI data through a multi-layer encoding-based alignment framework. The fMRI-optimized ReAlnet-fMRI exhibited higher similarity to the human brain than both CORnet and the control model in within-and across-subject as well as within- and across-modality model-brain (fMRI and EEG
arXiv Detail & Related papers (2024-07-15T03:31:42Z)
MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals. Currently, brain decoding is confined to a per-subject-per-model paradigm. We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z)
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity [60.983327742457995]
Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface. We devise Psychometry, an omnifit model for reconstructing images from functional Magnetic Resonance Imaging (fMRI) obtained from different subjects.
arXiv Detail & Related papers (2024-03-29T07:16:34Z)
Achieving More Human Brain-Like Vision via Human EEG Representational Alignment [1.811217832697894]
We present 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG. Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers. Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.
arXiv Detail & Related papers (2024-01-30T18:18:41Z)
Semantic Brain Decoding: from fMRI to conceptually similar image reconstruction of visual stimuli [0.29005223064604074]
We propose a novel approach to brain decoding that also relies on semantic and contextual similarity. We employ an fMRI dataset of natural image vision and create a deep learning decoding pipeline inspired by the existence of both bottom-up and top-down processes in human vision. We produce reconstructions of visual stimuli that match the original content very well on a semantic level, surpassing the state of the art in previous literature.
arXiv Detail & Related papers (2022-12-13T16:54:08Z)
Human alignment of neural network representations [28.32452075196472]
We investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses.<n>We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses.<n>We find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not.
arXiv Detail & Related papers (2022-11-02T15:23:16Z)
Functional2Structural: Cross-Modality Brain Networks Representation Learning [55.24969686433101]
Graph mining on brain networks may facilitate the discovery of novel biomarkers for clinical phenotypes and neurodegenerative diseases. We propose a novel graph learning framework, known as Deep Signed Brain Networks (DSBN), with a signed graph encoder. We validate our framework on clinical phenotype and neurodegenerative disease prediction tasks using two independent, publicly available datasets.
arXiv Detail & Related papers (2022-05-06T03:45:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.