Linguistics from a topological viewpoint
- URL: http://arxiv.org/abs/2403.15440v1
- Date: Sat, 16 Mar 2024 23:10:42 GMT
- Title: Linguistics from a topological viewpoint
- Authors: Rui Dong,
- Abstract summary: In this paper, we describe a workflow to analyze the topological shapes of South American languages.
As a result, it is difficult to have a clear visualization of the data.
- Score: 2.4238741865874363
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Typological databases in linguistics are usually categorical-valued. As a result, it is difficult to have a clear visualization of the data. In this paper, we describe a workflow to analyze the topological shapes of South American languages by applying multiple correspondence analysis technique and topological data analysis methods.
Related papers
- An Interpretable Deep Learning Approach for Morphological Script Type Analysis [15.142597136864618]
We propose an interpretable deep learning-based approach to morphological script type analysis.
More precisely, we adapt a deep instance segmentation method to learn comparable character prototypes.
We demonstrate our approach by applying it to the Textualis Formata script type and its two subtypes formalized by A. Derolez: Northern and Southern Textualis
arXiv Detail & Related papers (2024-08-20T19:15:06Z) - Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Multilingual Gradient Word-Order Typology from Universal Dependencies [2.968112652976397]
Existing typological databases, including WALS and Grambank, suffer from inconsistencies primarily caused by their categorical format.
We introduce a new seed dataset made up of continuous-valued data, rather than categorical data, that can better reflect the variability of language.
arXiv Detail & Related papers (2024-02-02T15:54:19Z) - An Encoding of Abstract Dialectical Frameworks into Higher-Order Logic [57.24311218570012]
This approach allows for the computer-assisted analysis of abstract dialectical frameworks.
Exemplary applications include the formal analysis and verification of meta-theoretical properties.
arXiv Detail & Related papers (2023-12-08T09:32:26Z) - Morphological Disambiguation from Stemming Data [1.2183405753834562]
Kinyarwanda, a morphologically rich language, currently lacks tools for automated morphological analysis.
We learn to morphologically disambiguate Kinyarwanda verbal forms from a new stemming dataset collected through crowd-sourcing.
Our experiments reveal that inflectional properties of stems and morpheme association rules are the most discriminative features for disambiguation.
arXiv Detail & Related papers (2020-11-11T01:44:09Z) - Linguistic Typology Features from Text: Inferring the Sparse Features of
World Atlas of Language Structures [73.06435180872293]
We construct a recurrent neural network predictor based on byte embeddings and convolutional layers.
We show that some features from various linguistic types can be predicted reliably.
arXiv Detail & Related papers (2020-04-30T21:00:53Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Evaluating Transformer-Based Multilingual Text Classification [55.53547556060537]
We argue that NLP tools perform unequally across languages with different syntactic and morphological structures.
We calculate word order and morphological similarity indices to aid our empirical study.
arXiv Detail & Related papers (2020-04-29T03:34:53Z) - Topological Data Analysis in Text Classification: Extracting Features
with Additive Information [2.1410799064827226]
Topological Data Analysis is challenging to apply to high dimensional numeric data.
Topological features carry some exclusive information not captured by conventional text mining methods.
Adding topological features to the conventional features in ensemble models improves the classification results.
arXiv Detail & Related papers (2020-03-29T21:02:09Z) - A Novel Method of Extracting Topological Features from Word Embeddings [2.4063592468412267]
We introduce a novel algorithm to extract topological features from word embedding representation of text.
We will show our defined topological features may outperform conventional text mining features.
arXiv Detail & Related papers (2020-03-29T16:55:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.