The language of sounds unheard: Exploring musical timbre semantics of
large language models
- URL: http://arxiv.org/abs/2304.07830v3
- Date: Thu, 4 May 2023 10:05:26 GMT
- Title: The language of sounds unheard: Exploring musical timbre semantics of
large language models
- Authors: Kai Siedenburg and Charalampos Saitis
- Abstract summary: Given the recent proliferation of large language models (LLMs), we asked whether such models exhibit an organisation of perceptual semantics similar to those observed in humans.
We elicited multiple responses in separate chats, analogous to having multiple human raters.
ChatGPT generated semantic profiles that only partially correlated with human ratings, yet showed robust agreement along well-known psychophysical dimensions of musical sounds.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Semantic dimensions of sound have been playing a central role in
understanding the nature of auditory sensory experience as well as the broader
relation between perception, language, and meaning. Accordingly, and given the
recent proliferation of large language models (LLMs), here we asked whether
such models exhibit an organisation of perceptual semantics similar to those
observed in humans. Specifically, we prompted ChatGPT, a chatbot based on a
state-of-the-art LLM, to rate musical instrument sounds on a set of 20 semantic
scales. We elicited multiple responses in separate chats, analogous to having
multiple human raters. ChatGPT generated semantic profiles that only partially
correlated with human ratings, yet showed robust agreement along well-known
psychophysical dimensions of musical sounds such as brightness (bright-dark)
and pitch height (deep-high). Exploratory factor analysis suggested the same
dimensionality but different spatial configuration of a latent factor space
between the chatbot and human ratings. Unexpectedly, the chatbot showed degrees
of internal variability that were comparable in magnitude to that of human
ratings. Our work highlights the potential of LLMs to capture salient
dimensions of human sensory experience.
Related papers
- SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - DiverseDialogue: A Methodology for Designing Chatbots with Human-Like Diversity [5.388338680646657]
We show that GPT-4o mini, when used as simulated human participants, systematically differ from those between actual humans across multiple linguistic features.
We propose an approach that automatically generates prompts for user simulations by incorporating features derived from real human interactions.
Our method of prompt optimization, tailored to target specific linguistic features, shows significant improvements.
arXiv Detail & Related papers (2024-08-30T21:33:58Z) - Evaluating Speaker Identity Coding in Self-supervised Models and Humans [0.42303492200814446]
Speaker identity plays a significant role in human communication and is being increasingly used in societal applications.
We show that self-supervised representations from different families are significantly better for speaker identification over acoustic representations.
We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks.
arXiv Detail & Related papers (2024-06-14T20:07:21Z) - Exploring Spatial Schema Intuitions in Large Language and Vision Models [8.944921398608063]
We investigate whether large language models (LLMs) effectively capture implicit human intuitions about building blocks of language.
Surprisingly, correlations between model outputs and human responses emerge, revealing adaptability without a tangible connection to embodied experiences.
This research contributes to a nuanced understanding of the interplay between language, spatial experiences, and computations made by large language models.
arXiv Detail & Related papers (2024-02-01T19:25:50Z) - A Linguistic Comparison between Human and ChatGPT-Generated Conversations [9.022590646680095]
The research employs Linguistic Inquiry and Word Count analysis, comparing ChatGPT-generated conversations with human conversations.
Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone.
arXiv Detail & Related papers (2024-01-29T21:43:27Z) - Divergences between Language Models and Human Brains [63.405788999891335]
Recent research has hinted that brain signals can be effectively predicted using internal representations of language models (LMs)
We show that there are clear differences in how LMs and humans represent and use language.
We identify two domains that are not captured well by LMs: social/emotional intelligence and physical commonsense.
arXiv Detail & Related papers (2023-11-15T19:02:40Z) - Can Language Models Learn to Listen? [96.01685069483025]
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words.
Our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE.
We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study.
arXiv Detail & Related papers (2023-08-21T17:59:02Z) - Do large language models resemble humans in language use? [1.8524806794216748]
Large language models (LLMs) such as ChatGPT and Vicuna have shown remarkable capacities in comprehending and producing language.
We subjected ChatGPT and Vicuna to 12 experiments ranging from sounds to dialogue, preregistered and with 1000 runs (i.e., iterations) per experiment.
ChatGPT and Vicuna replicated the human pattern of language use in 10 and 7 out of the 12 experiments, respectively.
arXiv Detail & Related papers (2023-03-10T10:47:59Z) - Information-Restricted Neural Language Models Reveal Different Brain
Regions' Sensitivity to Semantics, Syntax and Context [87.31930367845125]
We trained a lexical language model, Glove, and a supra-lexical language model, GPT-2, on a text corpus.
We then assessed to what extent these information-restricted models were able to predict the time-courses of fMRI signal of humans listening to naturalistic text.
Our analyses show that, while most brain regions involved in language are sensitive to both syntactic and semantic variables, the relative magnitudes of these effects vary a lot across these regions.
arXiv Detail & Related papers (2023-02-28T08:16:18Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.