Re-evaluating phoneme frequencies
- URL: http://arxiv.org/abs/2006.05206v2
- Date: Tue, 27 Oct 2020 03:56:14 GMT
- Title: Re-evaluating phoneme frequencies
- Authors: Jayden L. Macklin-Cordes, Erich R. Round
- Abstract summary: We re-evaluate the distributions claimed to characterize phoneme frequencies.
We find evidence supporting earlier results, but also nuancing them and increasing our understanding of them.
We identify a potential account for why, despite there being an important role for phonetic substance in phonemic change, we could still expect inventories with highly diverse phonetic content to share similar distributions of phoneme frequencies.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal processes can give rise to distinctive distributions in the linguistic
variables that they affect. Consequently, a secure understanding of a
variable's distribution can hold a key to understanding the forces that have
causally shaped it. A storied distribution in linguistics has been Zipf's law,
a kind of power law. In the wake of a major debate in the sciences around
power-law hypotheses and the unreliability of earlier methods of evaluating
them, here we re-evaluate the distributions claimed to characterize phoneme
frequencies. We infer the fit of power laws and three alternative distributions
to 166 Australian languages, using a maximum likelihood framework. We find
evidence supporting earlier results, but also nuancing them and increasing our
understanding of them. Most notably, phonemic inventories appear to have a
Zipfian-like frequency structure among their most-frequent members (though
perhaps also a lognormal structure) but a geometric (or exponential) structure
among the least-frequent. We compare these new insights the kinds of causal
processes that affect the evolution of phonemic inventories over time, and
identify a potential account for why, despite there being an important role for
phonetic substance in phonemic change, we could still expect inventories with
highly diverse phonetic content to share similar distributions of phoneme
frequencies. We conclude with priorities for future work in this promising
program of research.
Related papers
- Surprise! Uniform Information Density Isn't the Whole Story: Predicting Surprisal Contours in Long-form Discourse [54.08750245737734]
We propose that speakers modulate information rate based on location within a hierarchically-structured model of discourse.
We find that hierarchical predictors are significant predictors of a discourse's information contour and that deeply nested hierarchical predictors are more predictive than shallow ones.
arXiv Detail & Related papers (2024-10-21T14:42:37Z) - On the Role of Context in Reading Time Prediction [50.87306355705826]
We present a new perspective on how readers integrate context during real-time language comprehension.
Our proposals build on surprisal theory, which posits that the processing effort of a linguistic unit is an affine function of its in-context information content.
arXiv Detail & Related papers (2024-09-12T15:52:22Z) - Causal Layering via Conditional Entropy [85.01590667411956]
Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates.
We provide ways to recover layerings of a graph by accessing the data via a conditional entropy oracle.
arXiv Detail & Related papers (2024-01-19T05:18:28Z) - Probabilistic Method of Measuring Linguistic Productivity [0.0]
I propose a new way of measuring linguistic productivity that objectively assesses the ability of an affix to be used to coin new complex words.
token frequency does not dominate the productivity measure but naturally influences the sampling of bases.
A corpus-based approach and randomised design assure that true neologisms and words coined long ago have equal chances to be selected.
arXiv Detail & Related papers (2023-08-24T08:36:28Z) - Testing the Predictions of Surprisal Theory in 11 Languages [77.45204595614]
We investigate the relationship between surprisal and reading times in eleven different languages.
By focusing on a more diverse set of languages, we argue that these results offer the most robust link to-date between information theory and incremental language processing across languages.
arXiv Detail & Related papers (2023-07-07T15:37:50Z) - Slangvolution: A Causal Analysis of Semantic Change and Frequency
Dynamics in Slang [18.609276255676175]
We study slang, an informal language that is typically restricted to a specific group or social setting.
We analyze the semantic change and frequency shift of slang words and compare them to those of standard, nonslang words.
We show that slang words undergo less semantic change but tend to have larger frequency shifts over time.
arXiv Detail & Related papers (2022-03-09T11:34:43Z) - Evolution and trade-off dynamics of functional load [0.0]
We apply phylogenetic methods to examine the diachronic evolution of FL across 90 languages of the Pama-Nyungan (PN) family of Australia.
We find a high degree of phylogenetic signal in FL. Though phylogenetic signal has been reported for phonological structures, such as phonotactics, its detection in measures of phonological function is novel.
arXiv Detail & Related papers (2021-12-22T20:57:50Z) - Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs.
We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables.
We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z) - The empirical structure of word frequency distributions [0.0]
I show that first names form natural communicative distributions in most languages.
I then show this pattern of findings replicates in communicative distributions of English nouns and verbs.
arXiv Detail & Related papers (2020-01-09T20:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.