Language Aligned Visual Representations Predict Human Behavior in
Naturalistic Learning Tasks
- URL: http://arxiv.org/abs/2306.09377v1
- Date: Thu, 15 Jun 2023 08:18:29 GMT
- Title: Language Aligned Visual Representations Predict Human Behavior in
Naturalistic Learning Tasks
- Authors: Can Demircan, Tankred Saanum, Leonardo Pettini, Marcel Binz, Blazej M
Baczkowski, Paula Kaanders, Christian F Doeller, Mona M Garvert, Eric Schulz
- Abstract summary: Humans possess the ability to identify and generalize relevant features of natural objects.
We conducted two experiments involving category learning and reward learning.
Participants successfully identified the relevant stimulus features within a few trials.
We performed an extensive model comparison, evaluating the trial-by-trial predictive accuracy of diverse deep learning models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans possess the ability to identify and generalize relevant features of
natural objects, which aids them in various situations. To investigate this
phenomenon and determine the most effective representations for predicting
human behavior, we conducted two experiments involving category learning and
reward learning. Our experiments used realistic images as stimuli, and
participants were tasked with making accurate decisions based on novel stimuli
for all trials, thereby necessitating generalization. In both tasks, the
underlying rules were generated as simple linear functions using stimulus
dimensions extracted from human similarity judgments. Notably, participants
successfully identified the relevant stimulus features within a few trials,
demonstrating effective generalization. We performed an extensive model
comparison, evaluating the trial-by-trial predictive accuracy of diverse deep
learning models' representations of human choices. Intriguingly,
representations from models trained on both text and image data consistently
outperformed models trained solely on images, even surpassing models using the
features that generated the task itself. These findings suggest that
language-aligned visual representations possess sufficient richness to describe
human generalization in naturalistic settings and emphasize the role of
language in shaping human cognition.
Related papers
- PersLLM: A Personified Training Approach for Large Language Models [63.75008885222351]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Human Simulacra: Benchmarking the Personification of Large Language Models [38.21708264569801]
Large language models (LLMs) are recognized as systems that closely mimic aspects of human intelligence.
This paper introduces a framework for constructing virtual characters' life stories from the ground up.
Experimental results demonstrate that our constructed simulacra can produce personified responses that align with their target characters.
arXiv Detail & Related papers (2024-02-28T09:11:14Z) - Generating Human-Centric Visual Cues for Human-Object Interaction
Detection via Large Vision-Language Models [59.611697856666304]
Human-object interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions.
We propose three prompts with VLM to generate human-centric visual cues within an image from multiple perspectives of humans.
We develop a transformer-based multimodal fusion module with multitower architecture to integrate visual cue features into the instance and interaction decoders.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - Using Artificial Populations to Study Psychological Phenomena in Neural
Models [0.0]
Investigation of cognitive behavior in language models must be conducted in an appropriate population for the results to be meaningful.
We leverage work in uncertainty estimation in a novel approach to efficiently construct experimental populations.
We provide theoretical grounding in the uncertainty estimation literature and motivation from current cognitive work regarding language models.
arXiv Detail & Related papers (2023-08-15T20:47:51Z) - Turning large language models into cognitive models [0.0]
We show that large language models can be turned into cognitive models.
These models offer accurate representations of human behavior, even outperforming traditional cognitive models in two decision-making domains.
Taken together, these results suggest that large, pre-trained models can be adapted to become generalist cognitive models.
arXiv Detail & Related papers (2023-06-06T18:00:01Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Model-agnostic Fits for Understanding Information Seeking Patterns in
Humans [0.0]
In decision making tasks under uncertainty, humans display characteristic biases in seeking, integrating, and acting upon information relevant to the task.
Here, we reexamine data from previous carefully designed experiments, collected at scale, that measured and catalogued these biases in aggregate form.
We design deep learning models that replicate these biases in aggregate, while also capturing individual variation in behavior.
arXiv Detail & Related papers (2020-12-09T04:34:58Z) - Action similarity judgment based on kinematic primitives [48.99831733355487]
We investigate to which extent a computational model based on kinematics can determine action similarity.
The chosen model has its roots in developmental robotics and performs action classification based on learned kinematic primitives.
The results show that both the model and human performance are highly accurate in an action similarity task based on kinematic-level features.
arXiv Detail & Related papers (2020-08-30T13:58:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.