Emergence of a phonological bias in ChatGPT
- URL: http://arxiv.org/abs/2305.15929v2
- Date: Sat, 27 May 2023 09:19:54 GMT
- Title: Emergence of a phonological bias in ChatGPT
- Authors: Juan Manuel Toro
- Abstract summary: I demonstrate that ChatGPT displays phonological biases that are a hallmark of human language processing.
ChatGPT has a tendency to use consonants over vowels to identify words.
This is observed across languages that differ in their relative distribution of consonants and vowels such as English and Spanish.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current large language models, such as OpenAI's ChatGPT, have captured the
public's attention because how remarkable they are in the use of language.
Here, I demonstrate that ChatGPT displays phonological biases that are a
hallmark of human language processing. More concretely, just like humans,
ChatGPT has a consonant bias. That is, the chatbot has a tendency to use
consonants over vowels to identify words. This is observed across languages
that differ in their relative distribution of consonants and vowels such as
English and Spanish. Despite the differences in how current artificial
intelligence language models are trained to process linguistic stimuli and how
human infants acquire language, such training seems to be enough for the
emergence of a phonological bias in ChatGPT
Related papers
- Grammaticality Representation in ChatGPT as Compared to Linguists and Laypeople [0.0]
This study builds upon a previous study that collected laypeople's grammatical judgments on 148 linguistic phenomena.
Our primary focus was to compare ChatGPT with both laypeople and linguists in the judgement of these linguistic constructions.
Overall, our findings demonstrate convergence rates ranging from 73% to 95% between ChatGPT and linguists, with an overall point-estimate of 89%.
arXiv Detail & Related papers (2024-06-17T00:23:16Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into
the Morphological Capabilities of a Large Language Model [23.60677380868016]
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills.
Here, we conduct the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages.
We find that ChatGPT massively underperforms purpose-built systems, particularly in English.
arXiv Detail & Related papers (2023-10-23T17:21:03Z) - Playing with Words: Comparing the Vocabulary and Lexical Richness of
ChatGPT and Humans [3.0059120458540383]
generative language models such as ChatGPT have triggered a revolution that can transform how text is generated.
Will the use of tools such as ChatGPT increase or reduce the vocabulary used or the lexical richness?
This has implications for words, as those not included in AI-generated content will tend to be less and less popular and may eventually be lost.
arXiv Detail & Related papers (2023-08-14T21:19:44Z) - Ethical ChatGPT: Concerns, Challenges, and Commandments [5.641321839562139]
This paper highlights specific ethical concerns on ChatGPT and articulates key challenges when ChatGPT is used in various applications.
Practical commandments of ChatGPT are also proposed that can serve as checklist guidelines for those applying ChatGPT in their applications.
arXiv Detail & Related papers (2023-05-18T02:04:13Z) - Phoenix: Democratizing ChatGPT across Languages [68.75163236421352]
We release a large language model "Phoenix", achieving competitive performance among open-source English and Chinese models.
We believe this work will be beneficial to make ChatGPT more accessible, especially in countries where people cannot use ChatGPT due to restrictions from OpenAI or local goverments.
arXiv Detail & Related papers (2023-04-20T16:50:04Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - Improve Bilingual TTS Using Dynamic Language and Phonology Embedding [10.244215079409797]
This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker.
We specially design an embedding strength modulator to capture the dynamic strength of language and phonology.
arXiv Detail & Related papers (2022-12-07T03:46:18Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Phonological Features for 0-shot Multilingual Speech Synthesis [50.591267188664666]
We show that code-switching is possible for languages unseen during training, even within monolingual models.
We generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.
arXiv Detail & Related papers (2020-08-06T18:25:18Z) - AlloVera: A Multilingual Allophone Database [137.3686036294502]
AlloVera provides mappings from 218 allophones to phonemes for 14 languages.
We show that a "universal" allophone model, Allosaurus, built with AlloVera, outperforms "universal" phonemic models and language-specific models on a speech-transcription task.
arXiv Detail & Related papers (2020-04-17T02:02:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.