Phylogenetic typology
- URL: http://arxiv.org/abs/2103.10198v2
- Date: Fri, 19 Mar 2021 09:41:32 GMT
- Title: Phylogenetic typology
- Authors: Gerhard J\"ager and Johannes Wahle
- Abstract summary: We propose a novel method to estimate the frequency distribution of linguistic variables.
Unlike previous approaches, our technique uses all available data.
As a case study, we investigate a series of potential word-order correlations across the languages of the world.
- Score: 0.913755431537592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this article we propose a novel method to estimate the frequency
distribution of linguistic variables while controlling for statistical
non-independence due to shared ancestry. Unlike previous approaches, our
technique uses all available data, from language families large and small as
well as from isolates, while controlling for different degrees of relatedness
on a continuous scale estimated from the data. Our approach involves three
steps: First, distributions of phylogenies are inferred from lexical data.
Second, these phylogenies are used as part of a statistical model to
statistically estimate transition rates between parameter states. Finally, the
long-term equilibrium of the resulting Markov process is computed. As a case
study, we investigate a series of potential word-order correlations across the
languages of the world.
Related papers
- Modelled Multivariate Overlap: A method for measuring vowel merger [0.0]
This paper introduces a novel method for quantifying vowel overlap.
We evaluate this method on corpus speech data targeting the PIN-PEN merger in four dialects of English.
arXiv Detail & Related papers (2024-06-24T04:56:26Z) - Statistical Uncertainty in Word Embeddings: GloVe-V [35.04183792123882]
We introduce a method to obtain approximate, easy-to-use, and scalable reconstruction error variance estimates for GloVe.
To demonstrate the value of embeddings with variance (GloVe-V), we illustrate how our approach enables principled hypothesis testing in core word embedding tasks.
arXiv Detail & Related papers (2024-06-18T00:35:02Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space.
We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z) - Self-contained Beta-with-Spikes Approximation for Inference Under a
Wright-Fisher Model [0.0]
We construct a reliable estimation of evolutionary parameters within the Wright-Fisher model.
Our method of analysis builds on a Beta-with-Spikes approximation to the distribution of allele frequencies.
arXiv Detail & Related papers (2023-03-08T16:32:10Z) - Fisher information of correlated stochastic processes [0.0]
We prove two results concerning the estimation of parameters encoded in a memoryful process.
First, we show that for processes with finite Markov order, the Fisher information is always linear in the number of outcomes.
Second, we prove with suitable examples that correlations do not necessarily enhance the metrological precision.
arXiv Detail & Related papers (2022-06-01T12:51:55Z) - Modeling Voting for System Combination in Machine Translation [92.09572642019145]
We propose an approach to modeling voting for system combination in machine translation.
Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training.
arXiv Detail & Related papers (2020-07-14T09:59:38Z) - On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data.
We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations.
We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.