Kallini et al. (2024) do not compare impossible languages with constituency-based ones
- URL: http://arxiv.org/abs/2410.12271v1
- Date: Wed, 16 Oct 2024 06:16:30 GMT
- Title: Kallini et al. (2024) do not compare impossible languages with constituency-based ones
- Authors: Tim Hunter,
- Abstract summary: A central goal of linguistic theory is to find a characterization of the notion "possible human language"
Recent large language models (LLMs) in NLP applications arguably raises the possibility that LLMs might be computational devices that meet this goal.
I explain the confound and suggest some ways forward towards constructing a comparison that appropriately tests the underlying issue.
- Score: 0.0
- License:
- Abstract: A central goal of linguistic theory is to find a precise characterization of the notion "possible human language", in the form of a computational device that is capable of describing all and only the languages that can be acquired by a typically developing human child. The success of recent large language models (LLMs) in NLP applications arguably raises the possibility that LLMs might be computational devices that meet this goal. This would only be the case if, in addition to succeeding in learning human languages, LLMs struggle to learn "impossible" human languages. Kallini et al. (2024; "Mission: Impossible Language Models", Proc. ACL) conducted experiments aiming to test this by training GPT-2 on a variety of synthetic languages, and found that it learns some more successfully than others. They present these asymmetries as support for the idea that LLMs' inductive biases align with what is regarded as "possible" for human languages, but the most significant comparison has a confound that makes this conclusion unwarranted. In this paper I explain the confound and suggest some ways forward towards constructing a comparison that appropriately tests the underlying issue.
Related papers
- Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models [52.00446751692225]
We present a novel and simple yet effective method called textbfDictionary textbfInsertion textbfPrompting (textbfDIP)
When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the prompt for LLMs.
It then enables better translation into English and better English model thinking steps which leads to obviously better results.
arXiv Detail & Related papers (2024-11-02T05:10:50Z) - HLB: Benchmarking LLMs' Humanlikeness in Language Use [2.438748974410787]
We present a comprehensive humanlikeness benchmark (HLB) evaluating 20 large language models (LLMs)
We collected responses from over 2,000 human participants and compared them to outputs from the LLMs in these experiments.
Our results reveal fine-grained differences in how well LLMs replicate human responses across various linguistic levels.
arXiv Detail & Related papers (2024-09-24T09:02:28Z) - CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - No Such Thing as a General Learner: Language models and their dual optimization [3.2228025627337864]
We argue that neither humans nor LLMs are general learners, in a variety of senses.
We argue that the performance of LLMs, whether similar or dissimilar to that of humans, does not weigh easily on important debates about the importance of human cognitive biases for language.
arXiv Detail & Related papers (2024-08-18T17:01:42Z) - Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency [0.11510009152620666]
We argue that claims regarding linguistic capabilities of Large Language Models (LLMs) are based on at least two unfounded assumptions.
Language completeness assumes that a distinct and complete thing such as a natural language' exists.
The assumption of data completeness relies on the belief that a language can be quantified and wholly captured by data.
arXiv Detail & Related papers (2024-07-11T18:06:01Z) - Understanding and Mitigating Language Confusion in LLMs [76.96033035093204]
We evaluate 15 typologically diverse languages with existing and newly-created English and multilingual prompts.
We find that Llama Instruct and Mistral models exhibit high degrees of language confusion.
We find that language confusion can be partially mitigated via few-shot prompting, multilingual SFT and preference tuning.
arXiv Detail & Related papers (2024-06-28T17:03:51Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Language Generation in the Limit [0.7787343335258782]
We show that there is an agent that is able to generate in the limit for every countable list of candidate languages.
This contrasts dramatically with negative results due to Gold and Angluin in a well-studied model of language learning.
arXiv Detail & Related papers (2024-04-10T05:53:25Z) - Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions [49.97641297850361]
LINGOLLM is a training-free approach to enable an LLM to process unseen languages that hardly occur in its pre-training.
We implement LINGOLLM on top of two models, GPT-4 and Mixtral, and evaluate their performance on 5 tasks across 8 endangered or low-resource languages.
Our results show that LINGOLLM elevates translation capability from GPT-4's 0 to 10.5 BLEU for 10 language directions.
arXiv Detail & Related papers (2024-02-28T03:44:01Z) - Mission: Impossible Language Models [29.249131112359503]
We develop a set of synthetic impossible languages of differing complexity.
At one end are languages that are inherently impossible, such as random and irreversible shuffles of English words.
At the other end are languages that may not be intuitively impossible but are often considered so in linguistics.
arXiv Detail & Related papers (2024-01-12T07:24:26Z) - How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering [52.86931192259096]
Knowledge Base Question Answering (KBQA) aims to answer natural language questions based on facts in knowledge bases.
Recent works leverage the capabilities of large language models (LLMs) for logical form generation to improve performance.
arXiv Detail & Related papers (2024-01-11T09:27:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.