Markov Chain Monte-Carlo Phylogenetic Inference Construction in
Computational Historical Linguistics
- URL: http://arxiv.org/abs/2002.09637v2
- Date: Sat, 14 Mar 2020 02:03:54 GMT
- Title: Markov Chain Monte-Carlo Phylogenetic Inference Construction in
Computational Historical Linguistics
- Authors: Tianyi Ni
- Abstract summary: More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges.
In this paper, I am going to use computational method to cluster the languages and use Markov Chain Monte Carlo (MCMC) method to build the language typology relationship tree.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: More and more languages in the world are under study nowadays, as a result,
the traditional way of historical linguistics study is facing some challenges.
For example, the linguistic comparative research among languages needs manual
annotation, which becomes more and more impossible with the increasing amount
of language data coming out all around the world. Although it could hardly
replace linguists work, the automatic computational methods have been taken
into consideration and it can help people reduce their workload. One of the
most important work in historical linguistics is word comparison from different
languages and find the cognate words for them, which means people try to figure
out if the two languages are related to each other or not. In this paper, I am
going to use computational method to cluster the languages and use Markov Chain
Monte Carlo (MCMC) method to build the language typology relationship tree
based on the clusters.
Related papers
- Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers.
It is common to instead use proxy tasks that are similar in only an informal sense.
We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z) - The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance.
We observe that the existence of a predominant language during training boosts the performance of less frequent languages.
As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z) - Weakly-supervised Deep Cognate Detection Framework for Low-Resourced
Languages Using Morphological Knowledge of Closely-Related Languages [1.7622337807395716]
Exploiting cognates for transfer learning in under-resourced languages is an exciting opportunity for language understanding tasks.
Previous approaches mainly focused on supervised cognate detection tasks based on orthographic, phonetic or state-of-the-art contextual language models.
This paper proposes a novel language-agnostic weakly-supervised deep cognate detection framework for under-resourced languages.
arXiv Detail & Related papers (2023-11-09T05:46:41Z) - Learning to pronounce as measuring cross lingual joint
orthography-phonology complexity [0.0]
We investigate what makes a language "hard to pronounce" by modelling the task of grapheme-to-phoneme (g2p) transliteration.
We show that certain characteristics emerge that separate easier and harder languages with respect to learning to pronounce.
arXiv Detail & Related papers (2022-01-29T14:44:39Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Linguistic Classification using Instance-Based Learning [0.0]
We take a contrarian approach and question the tree-based model that is rather restrictive.
For example, the affinity that Sanskrit independently has with languages across Indo-European languages is better illustrated using a network model.
We can say the same about inter-relationship between languages in India, where the inter-relationships are better discovered than assumed.
arXiv Detail & Related papers (2020-12-02T04:12:10Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.