Related papers: Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

URL: http://arxiv.org/abs/2511.10045v2
Date: Sun, 16 Nov 2025 03:07:48 GMT
Title: Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism
Authors: Jinhong Jeong, Sunghyun Lee, Jaeyoung Lee, Seonah Han, Youngjae Yu,
Abstract summary: We investigate how Multimodal Large Language Models (MLLMs) interpret auditory information in human languages.<n>We present LEX-ICON, an extensive mimetic word dataset consisting of 8,052 words from four natural languages.<n>Key findings demonstrate (1) MLLMs' phonetic intuitions that align with existing linguistic research across multiple semantic dimensions and (2) phonosemantic attention patterns that highlight models' focus on iconic phonemes.
Score: 20.62188582405012
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sound symbolism is a linguistic concept that refers to non-arbitrary associations between phonetic forms and their meanings. We suggest that this can be a compelling probe into how Multimodal Large Language Models (MLLMs) interpret auditory information in human languages. We investigate MLLMs' performance on phonetic iconicity across textual (orthographic and IPA) and auditory forms of inputs with up to 25 semantic dimensions (e.g., sharp vs. round), observing models' layer-wise information processing by measuring phoneme-level attention fraction scores. To this end, we present LEX-ICON, an extensive mimetic word dataset consisting of 8,052 words from four natural languages (English, French, Japanese, and Korean) and 2,930 systematically constructed pseudo-words, annotated with semantic features applied across both text and audio modalities. Our key findings demonstrate (1) MLLMs' phonetic intuitions that align with existing linguistic research across multiple semantic dimensions and (2) phonosemantic attention patterns that highlight models' focus on iconic phonemes. These results bridge domains of artificial intelligence and cognitive linguistics, providing the first large-scale, quantitative analyses of phonetic iconicity in terms of MLLMs' interpretability.

Related papers

LLMs Know More Than Words: A Genre Study with Syntax, Metaphor & Phonetics [12.86515569519773]
We introduce a novel genre classification dataset derived from Project Gutenberg, a large-scale digital library offering free access to thousands of public domain literary works.<n>We augment each with three explicit linguistic feature sets (syntactic tree structures, metaphor counts, and phonetic metrics) to evaluate their impact on classification performance.
arXiv Detail & Related papers (2025-12-04T16:26:42Z)
Layer-wise Minimal Pair Probing Reveals Contextual Grammatical-Conceptual Hierarchy in Speech Representations [18.74784108693223]
Transformer-based speech language models (SLMs) have significantly improved neural speech recognition and understanding.<n>The extent to which SLMs encode nuanced syntactic and conceptual features remains unclear.<n>This study is the first to systematically evaluate the presence of contextual syntactic and semantic features across SLMs.
arXiv Detail & Related papers (2025-09-19T06:29:33Z)
Iconicity in Large Language Models [0.0]
Large language models' (LLMs') access to both meaning and sound of text is only mediated.<n>This study addresses this hypothesis by having GPT-4 generate highly iconic pseudowords in artificial languages.<n>The results revealed that humans can guess the meanings of pseudowords in the generated iconic language more accurately than words in distant natural languages.
arXiv Detail & Related papers (2025-01-10T01:00:05Z)
Large Language Models as Neurolinguistic Subjects: Discrepancy between Performance and Competence [49.60849499134362]
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning)<n>We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers.<n>We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; and (3) Instruction tuning won't change much competence but improve performance.
arXiv Detail & Related papers (2024-11-12T04:16:44Z)
PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research. We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs. We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z)
Encoding of lexical tone in self-supervised models of spoken language [3.7270979204213446]
This paper aims to analyze the tone encoding capabilities of Spoken Language Models (SLMs) We show that SLMs encode lexical tone to a significant degree even when they are trained on data from non-tonal languages. We find that SLMs behave similarly to native and non-native human participants in tone and consonant perception studies.
arXiv Detail & Related papers (2024-03-25T15:28:38Z)
Information-Theoretic Characterization of Vowel Harmony: A Cross-Linguistic Study on Word Lists [18.138642719651994]
We define an information-theoretic measure of harmonicity based on predictability of vowels in a natural language lexicon. We estimate this harmonicity using phoneme-level language models (PLMs) Our work demonstrates that word lists are a valuable resource for typological research.
arXiv Detail & Related papers (2023-08-09T11:32:16Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
Decomposing lexical and compositional syntax and semantics with deep language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension. Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations. The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z)
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions. Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text. We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.