Statistical patterns of word frequency suggesting the probabilistic
nature of human languages
- URL: http://arxiv.org/abs/2012.00187v1
- Date: Tue, 1 Dec 2020 00:48:27 GMT
- Title: Statistical patterns of word frequency suggesting the probabilistic
nature of human languages
- Authors: Shuiyuan Yu, Chunshan Xu, Haitao Liu
- Abstract summary: The study shows that important linguistic issues, such as linguistic universal, diachronic drift, and language variations can be translated into probability and frequency patterns in parole.
These findings suggest that human language may well be probabilistic systems by nature, and that statistical may well make inherent properties of human languages.
- Score: 5.059800023492045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional linguistic theories have largely regard language as a formal
system composed of rigid rules. However, their failures in processing real
language, the recent successes in statistical natural language processing, and
the findings of many psychological experiments have suggested that language may
be more a probabilistic system than a formal system, and thus cannot be
faithfully modeled with the either/or rules of formal linguistic theory. The
present study, based on authentic language data, confirmed that those important
linguistic issues, such as linguistic universal, diachronic drift, and language
variations can be translated into probability and frequency patterns in parole.
These findings suggest that human language may well be probabilistic systems by
nature, and that statistical may well make inherent properties of human
languages.
Related papers
- Analyzing The Language of Visual Tokens [48.62180485759458]
We take a natural-language-centric approach to analyzing discrete visual languages.
We show that higher token innovation drives greater entropy and lower compression, with tokens predominantly representing object parts.
We also show that visual languages lack cohesive grammatical structures, leading to higher perplexity and weaker hierarchical organization compared to natural languages.
arXiv Detail & Related papers (2024-11-07T18:59:28Z) - Language Models as Models of Language [0.0]
This chapter critically examines the potential contributions of modern language models to theoretical linguistics.
I review a growing body of empirical evidence suggesting that language models can learn hierarchical syntactic structure and exhibit sensitivity to various linguistic phenomena.
I conclude that closer collaboration between theoretical linguists and computational researchers could yield valuable insights.
arXiv Detail & Related papers (2024-08-13T18:26:04Z) - Perceptions of Linguistic Uncertainty by Language Models and Humans [26.69714008538173]
We investigate how language models map linguistic expressions of uncertainty to numerical responses.
We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner.
This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge.
arXiv Detail & Related papers (2024-07-22T17:26:12Z) - From Word Models to World Models: Translating from Natural Language to
the Probabilistic Language of Thought [124.40905824051079]
We propose rational meaning construction, a computational framework for language-informed thinking.
We frame linguistic meaning as a context-sensitive mapping from natural language into a probabilistic language of thought.
We show that LLMs can generate context-sensitive translations that capture pragmatically-appropriate linguistic meanings.
We extend our framework to integrate cognitively-motivated symbolic modules.
arXiv Detail & Related papers (2023-06-22T05:14:00Z) - False perspectives on human language: why statistics needs linguistics [0.8699677835130408]
We show that statistical measures can be defined on the basis of either structural or non-structural models.
Only models of surprisal that reflect syntactic structure are able to account for language regularities.
arXiv Detail & Related papers (2023-02-17T11:40:32Z) - Language Models as Inductive Reasoners [125.99461874008703]
We propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts.
We create a dataset termed DEER containing 1.2k rule-fact pairs for the task, where rules and facts are written in natural language.
We provide the first and comprehensive analysis of how well pretrained language models can induce natural language rules from natural language facts.
arXiv Detail & Related papers (2022-12-21T11:12:14Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - On the probability-quality paradox in language generation [76.69397802617064]
We analyze language generation through an information-theoretic lens.
We posit that human-like language should contain an amount of information close to the entropy of the distribution over natural strings.
arXiv Detail & Related papers (2022-03-31T17:43:53Z) - Probing Linguistic Information For Logical Inference In Pre-trained
Language Models [2.4366811507669124]
We propose a methodology for probing linguistic information for logical inference in pre-trained language model representations.
We find that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded.
We have demonstrated language models' potential as semantic and background knowledge bases for supporting symbolic inference methods.
arXiv Detail & Related papers (2021-12-03T07:19:42Z) - How individuals change language [1.2437226707039446]
We introduce a very general mathematical model that encompasses a wide variety of individual-level linguistic behaviours.
We compare the likelihood of empirically-attested changes in definite and indefinite articles in multiple languages under different assumptions.
We find that accounts of language change that appeal primarily to errors in childhood language acquisition are very weakly supported by the historical data.
arXiv Detail & Related papers (2021-04-20T19:02:49Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.