Still no evidence for an effect of the proportion of non-native speakers
on language complexity -- A response to Kauhanen, Einhaus & Walkden (2023)
- URL: http://arxiv.org/abs/2305.00217v6
- Date: Tue, 30 May 2023 05:42:08 GMT
- Title: Still no evidence for an effect of the proportion of non-native speakers
on language complexity -- A response to Kauhanen, Einhaus & Walkden (2023)
- Authors: Alexander Koplenig
- Abstract summary: Kauhanen, Einhaus & Walkden criticise results presented in one of my papers.
I show in this rejoinder that both points of criticism are explicitly mentioned and analysed in my paper.
- Score: 91.3755431537592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a recent paper published in the Journal of Language Evolution, Kauhanen,
Einhaus & Walkden (https://doi.org/10.1093/jole/lzad005, KEW) challenge the
results presented in one of my papers (Koplenig, Royal Society Open Science, 6,
181274 (2019), https://doi.org/10.1098/rsos.181274), in which I tried to show
through a series of statistical analyses that large numbers of L2 (second
language) speakers do not seem to affect the (grammatical or statistical)
complexity of a language. To this end, I focus on the way in which the
Ethnologue assesses language status: a language is characterised as vehicular
if, in addition to being used by L1 (first language) speakers, it should also
have a significant number of L2 users. KEW criticise both the use of
vehicularity as a (binary) indicator of whether a language has a significant
number of L2 users and the idea of imputing a zero proportion of L2 speakers to
non-vehicular languages whenever a direct estimate of that proportion is
unavailable. While I recognise the importance of post-publication commentary on
published research, I show in this rejoinder that both points of criticism are
explicitly mentioned and analysed in my paper. In addition, I also comment on
other points raised by KEW and demonstrate that both alternative analyses
offered by KEW do not stand up to closer scrutiny.
Related papers
- Can LLMs Simulate L2-English Dialogue? An Information-Theoretic Analysis of L1-Dependent Biases [22.048949559200935]
This study evaluates Large Language Models' ability to simulate non-native-like English use observed in human second language (L2) learners.
In dialogue-based interviews, we prompt LLMs to mimic L2 English learners with specific L1s across seven languages.
Our analysis examines L1-driven linguistic biases, such as reference word usage and avoidance behaviors, using information-theoretic and distributional density measures.
arXiv Detail & Related papers (2025-02-20T12:34:46Z) - Learning to Write Rationally: How Information Is Distributed in Non-Native Speakers' Essays [1.5039745292757671]
We compare essays written by second language learners with various native language (L1) backgrounds to investigate how they distribute information in their non-native language (L2) production.
Analyses of surprisal and constancy of entropy rate indicated that writers with higher L2 proficiency can reduce the expected uncertainty of language production while still conveying informative content.
arXiv Detail & Related papers (2024-11-05T23:09:37Z) - One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We present the first study aimed at objectively assessing the fairness and robustness of Large Language Models (LLMs) in handling dialects in canonical reasoning tasks.
We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K.
Our findings reveal that textbfalmost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Could We Have Had Better Multilingual LLMs If English Was Not the Central Language? [4.655168524016426]
Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on.
Our study delves into Llama2's translation capabilities.
Our experiments show that the 7B Llama2 model yields above 10 BLEU when translating into all languages it has seen.
arXiv Detail & Related papers (2024-02-21T16:32:38Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - Incorporating L2 Phonemes Using Articulatory Features for Robust Speech
Recognition [2.8360662552057323]
This study is on the efficient incorporation of the L2 phonemes, which in this work refer to Korean phonemes, through articulatory feature analysis.
We employ the lattice-free maximum mutual information (LF-MMI) objective in an end-to-end manner, to train the acoustic model to align and predict one of multiple pronunciation candidates.
Experimental results show that the proposed method improves ASR accuracy for Korean L2 speech by training solely on L1 speech data.
arXiv Detail & Related papers (2023-06-05T01:55:33Z) - Learning Cross-lingual Visual Speech Representations [108.68531445641769]
Cross-lingual self-supervised visual representation learning has been a growing research topic in the last few years.
We use the recently-proposed Raw Audio-Visual Speechs (RAVEn) framework to pre-train an audio-visual model with unlabelled data.
Our experiments show that: (1) multi-lingual models with more data outperform monolingual ones, but, when keeping the amount of data fixed, monolingual models tend to reach better performance.
arXiv Detail & Related papers (2023-03-14T17:05:08Z) - A bifurcation threshold for contact-induced language change [0.0]
This paper proposes a mathematical model of such situations based on reinforcement learning and nonlinear dynamics.
The model is evaluated with the help of two case studies, morphological levelling in Afrikaans and the erosion of null subjects in Afro-Peruvian Spanish.
arXiv Detail & Related papers (2021-11-23T18:21:12Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.