Adopting a human developmental visual diet yields robust, shape-based AI vision
- URL: http://arxiv.org/abs/2507.03168v1
- Date: Thu, 03 Jul 2025 20:52:08 GMT
- Title: Adopting a human developmental visual diet yields robust, shape-based AI vision
- Authors: Zejin Lu, Sushrut Thorat, Radoslaw M Cichy, Tim C Kietzmann,
- Abstract summary: Despite years of research, a striking misalignment between artificial intelligence (AI) systems and human vision persists.<n>We take inspiration from how human vision develops from early infancy into adulthood.<n>We show that guiding AI systems through this human-inspired curriculum produces models that closely align with human behaviour.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite years of research and the dramatic scaling of artificial intelligence (AI) systems, a striking misalignment between artificial and human vision persists. Contrary to humans, AI heavily relies on texture-features rather than shape information, lacks robustness to image distortions, remains highly vulnerable to adversarial attacks, and struggles to recognise simple abstract shapes within complex backgrounds. To close this gap, we here introduce a solution that arises from a previously underexplored direction: rather than scaling up, we take inspiration from how human vision develops from early infancy into adulthood. We quantified the visual maturation by synthesising decades of psychophysical and neurophysiological research into a novel developmental visual diet (DVD) for AI vision. We show that guiding AI systems through this human-inspired curriculum produces models that closely align with human behaviour on every hallmark of robust vision tested yielding the strongest reported reliance on shape information to date, abstract shape recognition beyond the state of the art, higher robustness to image corruptions, and stronger resilience to adversarial attacks. By outperforming high parameter AI foundation models trained on orders of magnitude more data, we provide evidence that robust AI vision can be achieved by guiding the way how a model learns, not merely how much it learns, offering a resource-efficient route toward safer and more human-like artificial visual systems.
Related papers
- Human-level 3D shape perception emerges from multi-view learning [63.048728487674815]
We develop a modeling framework that predicts human 3D shape inferences for arbitrary objects.<n>We achieve this with a novel class of neural networks trained using a visual-spatial objective over naturalistic sensory data.<n>We find that human-level 3D perception can emerge from a simple, scalable learning objective over naturalistic visual-spatial data.
arXiv Detail & Related papers (2026-02-19T18:56:05Z) - Illusions in Humans and AI: How Visual Perception Aligns and Diverges [14.661957041103404]
By comparing biological and artificial perception through the lens of illusions, we highlight critical differences in how each system constructs visual reality.<n>Visual illusions expose how human perception is based on contextual assumptions rather than raw sensory data.<n>This article explores how AI responds to classic visual illusions that involve color, size, shape, and motion.
arXiv Detail & Related papers (2025-08-17T16:12:54Z) - AI-powered virtual eye: perspective, challenges and opportunities [9.758442949590599]
We envision the "virtual eye" as a next-generation, AI-powered platform that uses interconnected foundation models to simulate the eye's intricate structure and biological function across all scales.<n>Despite challenges in interpretability, ethics, data processing and evaluation, the virtual eye holds the potential to revolutionize personalized ophthalmic care and accelerate research into ocular health and disease.
arXiv Detail & Related papers (2025-05-07T14:48:56Z) - Generative Physical AI in Vision: A Survey [78.07014292304373]
Gene Artificial Intelligence (AI) has rapidly advanced the field of computer vision by enabling machines to create and interpret visual data with unprecedented sophistication.<n>This transformation builds upon a foundation of generative models to produce realistic images, videos, and 3D/4D content.<n>As generative models evolve to increasingly integrate physical realism and dynamic simulation, their potential to function as "world simulators" expands.
arXiv Detail & Related papers (2025-01-19T03:19:47Z) - Aligning Machine and Human Visual Representations across Abstraction Levels [42.86478924838503]
Deep neural networks have achieved success across a wide range of applications, including as models of human behavior in vision tasks.
However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do.
We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction.
To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-like structure from its representations into pretrained state-of-the
arXiv Detail & Related papers (2024-09-10T13:41:08Z) - Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.<n>We show that our model captures the distinct functionalities of each region of human vision system.<n>Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - Probing Human Visual Robustness with Neurally-Guided Deep Neural Networks [18.994287352758697]
Humans effortlessly navigate the dynamic visual world, yet deep neural networks (DNNs) are surprisingly vulnerable to minor image perturbations.<n>Past theories suggest that human visual robustness arises from a representational space that evolves along the ventral visual stream (VVS) of the brain to increasingly tolerate object transformations.<n>We demonstrate a hierarchical improvement in DNN robustness: alignment to higher-order VVS regions leads to greater improvement.
arXiv Detail & Related papers (2024-05-04T04:33:20Z) - Visual Knowledge in the Big Model Era: Retrospect and Prospect [63.282425615863]
Visual knowledge is a new form of knowledge representation that can encapsulate visual concepts and their relations in a succinct, comprehensive, and interpretable manner.
As the knowledge about the visual world has been identified as an indispensable component of human cognition and intelligence, visual knowledge is poised to have a pivotal role in establishing machine intelligence.
arXiv Detail & Related papers (2024-04-05T07:31:24Z) - Achieving More Human Brain-Like Vision via Human EEG Representational Alignment [1.811217832697894]
We present 'Re(presentational)Al(ignment)net', a vision model aligned with human brain activity based on non-invasive EEG.
Our innovative image-to-brain multi-layer encoding framework advances human neural alignment by optimizing multiple model layers.
Our findings suggest that ReAlnet represents a breakthrough in bridging the gap between artificial and human vision, and paving the way for more brain-like artificial intelligence systems.
arXiv Detail & Related papers (2024-01-30T18:18:41Z) - On the Emergence of Symmetrical Reality [51.21203247240322]
We introduce the symmetrical reality framework, which offers a unified representation encompassing various forms of physical-virtual amalgamations.
We propose an instance of an AI-driven active assistance service that illustrates the potential applications of symmetrical reality.
arXiv Detail & Related papers (2024-01-26T16:09:39Z) - Exploring the Naturalness of AI-Generated Images [59.04528584651131]
We take the first step to benchmark and assess the visual naturalness of AI-generated images.
We propose the Joint Objective Image Naturalness evaluaTor (JOINT), to automatically predict the naturalness of AGIs that aligns human ratings.
We demonstrate that JOINT significantly outperforms baselines for providing more subjectively consistent results on naturalness assessment.
arXiv Detail & Related papers (2023-12-09T06:08:09Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - Degraded Polygons Raise Fundamental Questions of Neural Network Perception [5.423100066629618]
We revisit the task of recovering images under degradation, first introduced over 30 years ago in the Recognition-by-Components theory of human vision.
We implement the Automated Shape Recoverability Test for rapidly generating large-scale datasets of perimeter-degraded regular polygons.
We find that neural networks' behavior on this simple task conflicts with human behavior.
arXiv Detail & Related papers (2023-06-08T06:02:39Z) - Guiding Visual Attention in Deep Convolutional Neural Networks Based on
Human Eye Movements [0.0]
Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision.
Recent advances in deep learning seem to decrease this similarity.
We investigate a purely data-driven approach to obtain useful models.
arXiv Detail & Related papers (2022-06-21T17:59:23Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - MetaAvatar: Learning Animatable Clothed Human Models from Few Depth
Images [60.56518548286836]
To generate realistic cloth deformations from novel input poses, watertight meshes or dense full-body scans are usually needed as inputs.
We propose an approach that can quickly generate realistic clothed human avatars, represented as controllable neural SDFs, given only monocular depth images.
arXiv Detail & Related papers (2021-06-22T17:30:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.