Related papers: Phylogenetic typology

Phylogenetic typology

URL: http://arxiv.org/abs/2103.10198v2
Date: Fri, 19 Mar 2021 09:41:32 GMT
Title: Phylogenetic typology
Authors: Gerhard J\"ager and Johannes Wahle
Abstract summary: We propose a novel method to estimate the frequency distribution of linguistic variables. Unlike previous approaches, our technique uses all available data. As a case study, we investigate a series of potential word-order correlations across the languages of the world.
Score: 0.913755431537592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this article we propose a novel method to estimate the frequency distribution of linguistic variables while controlling for statistical non-independence due to shared ancestry. Unlike previous approaches, our technique uses all available data, from language families large and small as well as from isolates, while controlling for different degrees of relatedness on a continuous scale estimated from the data. Our approach involves three steps: First, distributions of phylogenies are inferred from lexical data. Second, these phylogenies are used as part of a statistical model to statistically estimate transition rates between parameter states. Finally, the long-term equilibrium of the resulting Markov process is computed. As a case study, we investigate a series of potential word-order correlations across the languages of the world.

Related papers

Ancestral Inference and Learning for Branching Processes in Random Environments [4.669957449088592]
We introduce a new methodology for ancestral inference utilizing the generalized method of moments. We demonstrate that the estimator's behavior is critically influenced by the coefficient of variation of the environment sequence.
arXiv Detail & Related papers (2025-01-27T21:51:04Z)
Parameter Inference via Differentiable Diffusion Bridge Importance Sampling [1.747623282473278]
We introduce a methodology for performing parameter inference in high-dimensional, non-linear diffusion processes. We illustrate its applicability for obtaining insights into the evolution of and relationships between species, including ancestral state reconstruction. This novel, numerically stable, score matching-based parameter inference framework is presented and demonstrated on biological two- and three-dimensional morphometry data.
arXiv Detail & Related papers (2024-11-13T19:33:47Z)
Modelled Multivariate Overlap: A method for measuring vowel merger [0.0]
This paper introduces a novel method for quantifying vowel overlap. We evaluate this method on corpus speech data targeting the PIN-PEN merger in four dialects of English.
arXiv Detail & Related papers (2024-06-24T04:56:26Z)
Statistical Uncertainty in Word Embeddings: GloVe-V [35.04183792123882]
We introduce a method to obtain approximate, easy-to-use, and scalable reconstruction error variance estimates for GloVe. To demonstrate the value of embeddings with variance (GloVe-V), we illustrate how our approach enables principled hypothesis testing in core word embedding tasks.
arXiv Detail & Related papers (2024-06-18T00:35:02Z)
Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z)
Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models. We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z)
Conformal inference for regression on Riemannian Manifolds [49.7719149179179]
We investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. We prove the almost sure convergence of the empirical version of these regions on the manifold to their population counterparts.
arXiv Detail & Related papers (2023-10-12T10:56:25Z)
Self-contained Beta-with-Spikes Approximation for Inference Under a Wright-Fisher Model [0.0]
We construct a reliable estimation of evolutionary parameters within the Wright-Fisher model. Our method of analysis builds on a Beta-with-Spikes approximation to the distribution of allele frequencies.
arXiv Detail & Related papers (2023-03-08T16:32:10Z)
Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated. We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z)
Fisher information of correlated stochastic processes [0.0]
We prove two results concerning the estimation of parameters encoded in a memoryful process. First, we show that for processes with finite Markov order, the Fisher information is always linear in the number of outcomes. Second, we prove with suitable examples that correlations do not necessarily enhance the metrological precision.
arXiv Detail & Related papers (2022-06-01T12:51:55Z)
Modeling Voting for System Combination in Machine Translation [92.09572642019145]
We propose an approach to modeling voting for system combination in machine translation. Our approach combines the advantages of statistical and neural methods since it can not only analyze the relations between hypotheses but also allow for end-to-end training.
arXiv Detail & Related papers (2020-07-14T09:59:38Z)
On Disentangled Representations Learned From Correlated Data [59.41587388303554]
We bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data. We show that systematically induced correlations in the dataset are being learned and reflected in the latent representations. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
arXiv Detail & Related papers (2020-06-14T12:47:34Z)
Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction. We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction. Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.