Principal Component Analysis as a Sanity Check for Bayesian
Phylolinguistic Reconstruction
- URL: http://arxiv.org/abs/2402.18877v1
- Date: Thu, 29 Feb 2024 05:47:34 GMT
- Title: Principal Component Analysis as a Sanity Check for Bayesian
Phylolinguistic Reconstruction
- Authors: Yugo Murawaki
- Abstract summary: Tree model assumes that languages descended from a common ancestor and underwent modifications over time.
This assumption can be violated to different extents due to contact and other factors.
We propose a simple sanity check: projecting a reconstructed tree onto a space generated by principal component analysis.
- Score: 3.652806821280741
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Bayesian approaches to reconstructing the evolutionary history of languages
rely on the tree model, which assumes that these languages descended from a
common ancestor and underwent modifications over time. However, this assumption
can be violated to different extents due to contact and other factors.
Understanding the degree to which this assumption is violated is crucial for
validating the accuracy of phylolinguistic inference. In this paper, we propose
a simple sanity check: projecting a reconstructed tree onto a space generated
by principal component analysis. By using both synthetic and real data, we
demonstrate that our method effectively visualizes anomalies, particularly in
the form of jogging.
Related papers
- Improved Neural Protoform Reconstruction via Reflex Prediction [11.105362395278142]
We argue that not only should protoforms be inferable from cognate sets (sets of related reflexes) but the reflexes should also be inferable from the protoforms.
We propose a system in which candidate protoforms from a reconstruction model are reranked by a reflex prediction model.
arXiv Detail & Related papers (2024-03-27T17:13:38Z) - Are Sounds Sound for Phylogenetic Reconstruction? [41.85920785319125]
We test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction.
Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average.
arXiv Detail & Related papers (2024-02-05T08:35:33Z) - Sharded Bayesian Additive Regression Trees [1.4213973379473654]
We introduce a randomization auxiliary variable and a sharding tree to decide partitioning of data.
By observing that the optimal design of a sharding tree can determine optimal sharding for sub-models on a product space, we introduce an intersection tree structure to completely specify both the sharding and modeling using only tree structures.
arXiv Detail & Related papers (2023-06-01T05:41:31Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Posterior Collapse of a Linear Latent Variable Model [6.2255027793924285]
This work identifies the existence and cause of a type of posterior collapse that frequently occurs in the Bayesian deep learning practice.
For a general linear latent variable model, we precisely identify the nature of posterior collapse to be the competition between the likelihood and the regularization of the mean due to the prior.
arXiv Detail & Related papers (2022-05-09T02:30:52Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - Towards a Theoretical Understanding of the Robustness of Variational
Autoencoders [82.68133908421792]
We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations.
We develop a novel criterion for robustness in probabilistic models: $r$-robustness.
We show that VAEs trained using disentangling methods score well under our robustness metrics.
arXiv Detail & Related papers (2020-07-14T21:22:29Z) - Exploiting Syntactic Structure for Better Language Modeling: A Syntactic
Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances"
Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z) - Spectral neighbor joining for reconstruction of latent tree models [5.229354894035374]
We develop Spectral Neighbor Joining, a novel method to recover the structure of latent tree graphical models.
We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix.
We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges.
arXiv Detail & Related papers (2020-02-28T05:13:08Z) - A Critical View of the Structural Causal Model [89.43277111586258]
We show that one can identify the cause and the effect without considering their interaction at all.
We propose a new adversarial training method that mimics the disentangled structure of the causal model.
Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.
arXiv Detail & Related papers (2020-02-23T22:52:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.