A Latent Space Theory for Emergent Abilities in Large Language Models
- URL: http://arxiv.org/abs/2304.09960v3
- Date: Wed, 13 Sep 2023 18:52:02 GMT
- Title: A Latent Space Theory for Emergent Abilities in Large Language Models
- Authors: Hui Jiang
- Abstract summary: We show that languages are not created randomly but rather to communicate information.
A strong association between languages and their underlying meanings, resulting in a sparse joint distribution.
With the advent of LLMs trained on big data and large models, we can now precisely assess the marginal distribution of languages.
- Score: 5.033924641692716
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Languages are not created randomly but rather to communicate information.
There is a strong association between languages and their underlying meanings,
resulting in a sparse joint distribution that is heavily peaked according to
their correlations. Moreover, these peak values happen to match with the
marginal distribution of languages due to the sparsity. With the advent of LLMs
trained on big data and large models, we can now precisely assess the marginal
distribution of languages, providing a convenient means of exploring the sparse
structures in the joint distribution for effective inferences. In this paper,
we categorize languages as either unambiguous or {\epsilon}-ambiguous and
present quantitative results to demonstrate that the emergent abilities of
LLMs, such as language understanding, in-context learning, chain-of-thought
prompting, and effective instruction fine-tuning, can all be attributed to
Bayesian inference on the sparse joint distribution of languages.
Related papers
- ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework [79.72910257530795]
ShifCon is a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one.
It shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters.
Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages [78.1866280652834]
Large language models (LM) are distributions over strings.
We investigate the learnability of regular LMs (RLMs) by RNN and Transformer LMs.
We find that the complexity of the RLM rank is strong and significant predictors of learnability for both RNNs and Transformers.
arXiv Detail & Related papers (2024-06-06T17:34:24Z) - Improving In-context Learning of Multilingual Generative Language Models with Cross-lingual Alignment [42.624862172666624]
We propose a simple yet effective cross-lingual alignment framework exploiting pairs of translation sentences.
It aligns the internal sentence representations across different languages via multilingual contrastive learning.
Experimental results show that even with less than 0.1 textperthousand of pre-training tokens, our alignment framework significantly boosts the cross-lingual abilities of generative language models.
arXiv Detail & Related papers (2023-11-14T11:24:08Z) - How Abstract Is Linguistic Generalization in Large Language Models?
Experiments with Argument Structure [2.530495315660486]
We investigate the degree to which pre-trained Transformer-based large language models represent relationships between contexts.
We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts.
However, LLMs fail at generalizations between related contexts that have not been observed during pre-training.
arXiv Detail & Related papers (2023-11-08T18:58:43Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - IERL: Interpretable Ensemble Representation Learning -- Combining
CrowdSourced Knowledge and Distributed Semantic Representations [11.008412414253662]
Large Language Models (LLMs) encode meanings of words in the form of distributed semantics.
Recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs.
We propose a novel ensemble learning method, Interpretable Ensemble Representation Learning (IERL), that systematically combines LLM and crowdsourced knowledge representations.
arXiv Detail & Related papers (2023-06-24T05:02:34Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Can Large Language Models Capture Dissenting Human Voices? [7.668954669688971]
Large language models (LLMs) have shown impressive achievements in solving a broad range of tasks.
We evaluate the performance and alignment of LLM distribution with humans using two different techniques.
We show LLMs exhibit limited ability in solving NLI tasks and simultaneously fail to capture human disagreement distribution.
arXiv Detail & Related papers (2023-05-23T07:55:34Z) - Evaluating statistical language models as pragmatic reasoners [39.72348730045737]
We evaluate the capacity of large language models to infer meanings of pragmatic utterances.
We find that LLMs can derive context-grounded, human-like distributions over the interpretations of several complex pragmatic utterances.
Results inform the inferential capacity of statistical language models, and their use in pragmatic and semantic parsing applications.
arXiv Detail & Related papers (2023-05-01T18:22:10Z) - Evaluating Distributional Distortion in Neural Language Modeling [81.83408583979745]
A heavy-tail of rare events accounts for a significant amount of the total probability mass of distributions in language.
Standard language modeling metrics such as perplexity quantify the performance of language models (LM) in aggregate.
We develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages.
arXiv Detail & Related papers (2022-03-24T01:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.