Related papers: Markov Chain Monte-Carlo Phylogenetic Inference Construction in Computational Historical Linguistics

Markov Chain Monte-Carlo Phylogenetic Inference Construction in Computational Historical Linguistics

URL: http://arxiv.org/abs/2002.09637v2
Date: Sat, 14 Mar 2020 02:03:54 GMT
Title: Markov Chain Monte-Carlo Phylogenetic Inference Construction in Computational Historical Linguistics
Authors: Tianyi Ni
Abstract summary: More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. In this paper, I am going to use computational method to cluster the languages and use Markov Chain Monte Carlo (MCMC) method to build the language typology relationship tree.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. For example, the linguistic comparative research among languages needs manual annotation, which becomes more and more impossible with the increasing amount of language data coming out all around the world. Although it could hardly replace linguists work, the automatic computational methods have been taken into consideration and it can help people reduce their workload. One of the most important work in historical linguistics is word comparison from different languages and find the cognate words for them, which means people try to figure out if the two languages are related to each other or not. In this paper, I am going to use computational method to cluster the languages and use Markov Chain Monte Carlo (MCMC) method to build the language typology relationship tree based on the clusters.

Related papers

From Isolates to Families: Using Neural Networks for Automated Language Affiliation [9.182884165239996]
In historical linguistics, the affiliation of languages to a common language family is traditionally carried out using a complex workflow. Large-scale standardized collections of multilingual wordlists and grammatical language structures might help to improve this and open new avenues for developing automated language affiliation. We present neural network models that use lexical and grammatical data from a worldwide sample of more than 1,000 languages with known affiliations to classify individual languages into families.
arXiv Detail & Related papers (2025-02-17T11:25:32Z)
Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers. It is common to instead use proxy tasks that are similar in only an informal sense. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
The Role of Language Imbalance in Cross-lingual Generalisation: Insights from Cloned Language Experiments [57.273662221547056]
In this study, we investigate an unintuitive novel driver of cross-lingual generalisation: language imbalance. We observe that the existence of a predominant language during training boosts the performance of less frequent languages. As we extend our analysis to real languages, we find that infrequent languages still benefit from frequent ones, yet whether language imbalance causes cross-lingual generalisation there is not conclusive.
arXiv Detail & Related papers (2024-04-11T17:58:05Z)
Weakly-supervised Deep Cognate Detection Framework for Low-Resourced Languages Using Morphological Knowledge of Closely-Related Languages [1.7622337807395716]
Exploiting cognates for transfer learning in under-resourced languages is an exciting opportunity for language understanding tasks. Previous approaches mainly focused on supervised cognate detection tasks based on orthographic, phonetic or state-of-the-art contextual language models. This paper proposes a novel language-agnostic weakly-supervised deep cognate detection framework for under-resourced languages.
arXiv Detail & Related papers (2023-11-09T05:46:41Z)
Learning to pronounce as measuring cross lingual joint orthography-phonology complexity [0.0]
We investigate what makes a language "hard to pronounce" by modelling the task of grapheme-to-phoneme (g2p) transliteration. We show that certain characteristics emerge that separate easier and harder languages with respect to learning to pronounce.
arXiv Detail & Related papers (2022-01-29T14:44:39Z)
Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis. We cluster all the target languages into multiple groups and name each group as a representation sprachbund. Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z)
Linguistic Classification using Instance-Based Learning [0.0]
We take a contrarian approach and question the tree-based model that is rather restrictive. For example, the affinity that Sanskrit independently has with languages across Indo-European languages is better illustrated using a network model. We can say the same about inter-relationship between languages in India, where the inter-relationships are better discovered than assumed.
arXiv Detail & Related papers (2020-12-02T04:12:10Z)
GATE: Graph Attention Transformer Encoder for Cross-lingual Relation and Event Extraction [107.8262586956778]
We introduce graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations. GCNs struggle to model words with long-range dependencies or are not directly connected in the dependency tree. We propose to utilize the self-attention mechanism to learn the dependencies between words with different syntactic distances.
arXiv Detail & Related papers (2020-10-06T20:30:35Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
Linguistic Typology Features from Text: Inferring the Sparse Features of World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers. We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z)
Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source. We observe that our representations embed typology and strengthen correlations with language relationships. We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.