Large language models predict human sensory judgments across six
modalities
- URL: http://arxiv.org/abs/2302.01308v2
- Date: Thu, 15 Jun 2023 17:18:04 GMT
- Title: Large language models predict human sensory judgments across six
modalities
- Authors: Raja Marjieh, Ilia Sucholutsky, Pol van Rijn, Nori Jacoby, Thomas L.
Griffiths
- Abstract summary: We show that state-of-the-art large language models can unlock new insights into the problem of recovering the perceptual world from language.
We elicit pairwise similarity judgments from GPT models across six psychophysical datasets.
We show that the judgments are significantly correlated with human data across all domains, recovering well-known representations like the color wheel and pitch spiral.
- Score: 12.914521751805658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Determining the extent to which the perceptual world can be recovered from
language is a longstanding problem in philosophy and cognitive science. We show
that state-of-the-art large language models can unlock new insights into this
problem by providing a lower bound on the amount of perceptual information that
can be extracted from language. Specifically, we elicit pairwise similarity
judgments from GPT models across six psychophysical datasets. We show that the
judgments are significantly correlated with human data across all domains,
recovering well-known representations like the color wheel and pitch spiral.
Surprisingly, we find that a model (GPT-4) co-trained on vision and language
does not necessarily lead to improvements specific to the visual modality. To
study the influence of specific languages on perception, we also apply the
models to a multilingual color-naming task. We find that GPT-4 replicates
cross-linguistic variation in English and Russian illuminating the interaction
of language and perception.
Related papers
- Analyzing The Language of Visual Tokens [48.62180485759458]
We take a natural-language-centric approach to analyzing discrete visual languages.
We show that higher token innovation drives greater entropy and lower compression, with tokens predominantly representing object parts.
We also show that visual languages lack cohesive grammatical structures, leading to higher perplexity and weaker hierarchical organization compared to natural languages.
arXiv Detail & Related papers (2024-11-07T18:59:28Z) - MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating
Chinese and English Computational Language Models [44.74364661212373]
This paper proposes MulCogBench, a cognitive benchmark dataset collected from native Chinese and English participants.
It encompasses a variety of cognitive data, including subjective semantic ratings, eye-tracking, functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG)
Results show that language models share significant similarities with human cognitive data and the similarity patterns are modulated by the data modality and stimuli complexity.
arXiv Detail & Related papers (2024-03-02T07:49:57Z) - Exploring Spatial Schema Intuitions in Large Language and Vision Models [8.944921398608063]
We investigate whether large language models (LLMs) effectively capture implicit human intuitions about building blocks of language.
Surprisingly, correlations between model outputs and human responses emerge, revealing adaptability without a tangible connection to embodied experiences.
This research contributes to a nuanced understanding of the interplay between language, spatial experiences, and computations made by large language models.
arXiv Detail & Related papers (2024-02-01T19:25:50Z) - Visual Grounding Helps Learn Word Meanings in Low-Data Regimes [47.7950860342515]
Modern neural language models (LMs) are powerful tools for modeling human sentence production and comprehension.
But to achieve these results, LMs must be trained in distinctly un-human-like ways.
Do models trained more naturalistically -- with grounded supervision -- exhibit more humanlike language learning?
We investigate this question in the context of word learning, a key sub-task in language acquisition.
arXiv Detail & Related papers (2023-10-20T03:33:36Z) - Does Conceptual Representation Require Embodiment? Insights From Large
Language Models [9.390117546307042]
We compare representations of 4,442 lexical concepts between humans and ChatGPTs (GPT-3.5 and GPT-4)
We identify two main findings: 1) Both models strongly align with human representations in non-sensorimotor domains but lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5; 2) GPT-4's gains are associated with its additional visual learning, which also appears to benefit related dimensions like haptics and imageability.
arXiv Detail & Related papers (2023-05-30T15:06:28Z) - Like a bilingual baby: The advantage of visually grounding a bilingual
language model [0.0]
We train an LSTM language model on images and captions in English and Spanish from MS-COCO-ES.
We find that the visual grounding improves the model's understanding of semantic similarity both within and across languages and improves perplexity.
Our results provide additional evidence of the advantages of visually grounded language models and point to the need for more naturalistic language data from multilingual speakers and multilingual datasets with perceptual grounding.
arXiv Detail & Related papers (2022-10-11T14:43:26Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.