Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports
- URL: http://arxiv.org/abs/2509.24298v1
- Date: Mon, 29 Sep 2025 05:22:33 GMT
- Title: Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports
- Authors: Changde Du, Yizhuo Lu, Zhongyu Huang, Yi Sun, Zisen Zhou, Shaozheng Qin, Huiguang He,
- Abstract summary: We show that large-scale similarity judgments can more faithfully capture the brain's affective geometry.<n>Our findings provide compelling evidence that MLLMs can autonomously develop rich, neurally-aligned affective representations.
- Score: 18.336392633341493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to represent emotion plays a significant role in human cognition and social interaction, yet the high-dimensional geometry of this affective space and its neural underpinnings remain debated. A key challenge, the `behavior-neural gap,' is the limited ability of human self-reports to predict brain activity. Here we test the hypothesis that this gap arises from the constraints of traditional rating scales and that large-scale similarity judgments can more faithfully capture the brain's affective geometry. Using AI models as `cognitive agents,' we collected millions of triplet odd-one-out judgments from a multimodal large language model (MLLM) and a language-only model (LLM) in response to 2,180 emotionally evocative videos. We found that the emergent 30-dimensional embeddings from these models are highly interpretable and organize emotion primarily along categorical lines, yet in a blended fashion that incorporates dimensional properties. Most remarkably, the MLLM's representation predicted neural activity in human emotion-processing networks with the highest accuracy, outperforming not only the LLM but also, counterintuitively, representations derived directly from human behavioral ratings. This result supports our primary hypothesis and suggests that sensory grounding--learning from rich visual data--is critical for developing a truly neurally-aligned conceptual framework for emotion. Our findings provide compelling evidence that MLLMs can autonomously develop rich, neurally-aligned affective representations, offering a powerful paradigm to bridge the gap between subjective experience and its neural substrates. Project page: https://reedonepeck.github.io/ai-emotion.github.io/.
Related papers
- Toward Cognitive Supersensing in Multimodal Large Language Model [67.15559571626747]
We introduce Cognitive Supersensing, a training paradigm that endows MLLMs with human-like visual imagery capabilities.<n>In experiments, MLLMs trained with Cognitive Supersensing significantly outperform state-of-the-art baselines on CogSense-Bench.<n>We will open-source the CogSense-Bench and our model weights.
arXiv Detail & Related papers (2026-02-02T02:19:50Z) - HumanLLM: Towards Personalized Understanding and Simulation of Human Nature [72.55730315685837]
HumanLLM is a foundation model designed for personalized understanding and simulation of individuals.<n>We first construct the Cognitive Genome, a large-scale corpus curated from real-world user data on platforms like Reddit, Twitter, Blogger, and Amazon.<n>We then formulate diverse learning tasks and perform supervised fine-tuning to empower the model to predict a wide range of individualized human behaviors, thoughts, and experiences.
arXiv Detail & Related papers (2026-01-22T09:27:27Z) - Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models [8.550786156000461]
We present the first neuron-level interpretability study of emotion-sensitive neurons (ESNs) in large audio-language models (LALMs)<n>We compare frequency-, entropy-, magnitude-, and contrast-based neuron selectors on multiple emotion recognition benchmarks.<n>Using inference-time interventions, we reveal a consistent emotion-specific signature.
arXiv Detail & Related papers (2026-01-06T15:46:35Z) - From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images [36.44183173680125]
Multimodal Large Language Models (MLLMs) are adept at answering what is in an image-identifying objects but often lack the ability to understand how an image feels to a human observer.<n>This gap is most evident when considering subjective cognitive properties, such as what makes an image memorable, funny, aesthetically pleasing, or emotionally evocative.<n>We introduce CogIP-Bench, a comprehensive benchmark for evaluating MLLMs on such image cognitive properties.
arXiv Detail & Related papers (2025-11-27T23:30:24Z) - AI shares emotion with humans across languages and cultures [12.530921452568291]
We assess human-AI emotional alignment across linguistic-cultural groups and model-families.<n>Our analyses reveal that LLM-derived emotion spaces are structurally congruent with human perception.<n>We show that model expressions can be stably and naturally modulated across distinct emotion categories.
arXiv Detail & Related papers (2025-06-11T14:42:30Z) - Artificial Intelligence Can Emulate Human Normative Judgments on Emotional Visual Scenes [0.09208007322096533]
We study whether state-of-the-art multimodal systems can emulate human emotional ratings on a standardized set of images.<n>The AI judgements correlate surprisingly well with the average human ratings.
arXiv Detail & Related papers (2025-03-24T15:41:23Z) - How Deep is Love in LLMs' Hearts? Exploring Semantic Size in Human-like Cognition [75.11808682808065]
This study investigates whether large language models (LLMs) exhibit similar tendencies in understanding semantic size.<n>Our findings reveal that multi-modal training is crucial for LLMs to achieve more human-like understanding.<n> Lastly, we examine whether LLMs are influenced by attention-grabbing headlines with larger semantic sizes in a real-world web shopping scenario.
arXiv Detail & Related papers (2025-03-01T03:35:56Z) - MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis [53.012111671763776]
This study introduces MEMO-Bench, a comprehensive benchmark consisting of 7,145 portraits, each depicting one of six different emotions.
Results demonstrate that existing T2I models are more effective at generating positive emotions than negative ones.
Although MLLMs show a certain degree of effectiveness in distinguishing and recognizing human emotions, they fall short of human-level accuracy.
arXiv Detail & Related papers (2024-11-18T02:09:48Z) - EmoLLM: Multimodal Emotional Understanding Meets Large Language Models [61.179731667080326]
Multi-modal large language models (MLLMs) have achieved remarkable performance on objective multimodal perception tasks.
But their ability to interpret subjective, emotionally nuanced multimodal content remains largely unexplored.
EmoLLM is a novel model for multimodal emotional understanding, incorporating with two core techniques.
arXiv Detail & Related papers (2024-06-24T08:33:02Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Unveiling Theory of Mind in Large Language Models: A Parallel to Single
Neurons in the Human Brain [2.5350521110810056]
Large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM)
The precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown.
arXiv Detail & Related papers (2023-09-04T15:26:15Z) - Language-Specific Representation of Emotion-Concept Knowledge Causally
Supports Emotion Inference [44.126681295827794]
This study used a form of artificial intelligence known as large language models (LLMs) to assess whether language-based representations of emotion causally contribute to the AI's ability to generate inferences about the emotional meaning of novel situations.
Our findings provide a proof-in-concept that even a LLM can learn about emotions in the absence of sensory-motor representations and highlight the contribution of language-derived emotion-concept knowledge for emotion inference.
arXiv Detail & Related papers (2023-02-19T14:21:33Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.