Computational valency lexica and Homeric formularity
- URL: http://arxiv.org/abs/2208.10795v1
- Date: Tue, 23 Aug 2022 08:03:16 GMT
- Title: Computational valency lexica and Homeric formularity
- Authors: Barbara McGillivray, Martina Astrid Rodda
- Abstract summary: We present AGVaLex, a lexicon for ancient Greek automatically extracted from the Ancient Greek Dependency Treebank.
It contains quantitative corpus-driven morphological, syntactic and lexical information about verbs and their arguments.
It has a wide range of applications for the study of the language of ancient Greek authors.
- Score: 1.6346069386394704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Distributional semantics, the quantitative study of meaning variation and
change through corpus collocations, is currently one of the most productive
research areas in computational linguistics. The wider availability of big data
and of reproducible algorithms for analysis has boosted its application to
living languages in recent years. But can we use distributional semantics to
study a language with such a limited corpus as ancient Greek? And can this
approach tell us something about such vexed questions in classical studies as
the language and composition of the Homeric poems? Our paper will compare the
semantic flexibility of formulae involving transitive verbs in archaic Greek
epic to similar verb phrases in a non-formulaic corpus, in order to detect
unique patterns of variation in formulae. To address this, we present AGVaLex,
a computational valency lexicon for ancient Greek automatically extracted from
the Ancient Greek Dependency Treebank. The lexicon contains quantitative
corpus-driven morphological, syntactic and lexical information about verbs and
their arguments, such as objects, subjects, and prepositional phrases, and has
a wide range of applications for the study of the language of ancient Greek
authors.
Related papers
- Entropy and type-token ratio in gigaword corpora [0.0]
We investigate entropy and text-token ratio, two metrics for lexical diversities, in six massive linguistic datasets in English, Spanish, and Turkish.
We find a functional relation between entropy and text-token ratio that holds across the corpora under consideration.
Our results contribute to the theoretical understanding of text structure and offer practical implications for fields like natural language processing.
arXiv Detail & Related papers (2024-11-15T14:40:59Z) - Patterns of Persistence and Diffusibility across the World's Languages [3.7055269158186874]
Colexification is a type of similarity where a single lexical form is used to convey multiple meanings.
We shed light on the linguistic causes of cross-lingual similarity in colexification and phonology.
We construct large-scale graphs incorporating semantic, genealogical, phonological and geographical data for 1,966 languages.
arXiv Detail & Related papers (2024-01-03T12:05:38Z) - Reliable Detection and Quantification of Selective Forces in Language
Change [3.55026004901472]
We apply a recently-introduced method to corpus data to quantify the strength of selection in specific instances of historical language change.
We show that this method is more reliable and interpretable than similar methods that have previously been applied.
arXiv Detail & Related papers (2023-05-25T10:20:15Z) - Universality and diversity in word patterns [0.0]
We present an analysis of lexical statistical connections for eleven major languages.
We find that the diverse manners that languages utilize to express word relations give rise to unique pattern distributions.
arXiv Detail & Related papers (2022-08-23T20:03:27Z) - An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts.
The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties.
The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z) - A Probabilistic Approach in Historical Linguistics Word Order Change in
Infinitival Clauses: from Latin to Old French [0.0]
This thesis investigates word order change in infinitival clauses in the history of Latin and Old French.
I examine a synchronic word order variation in each stage of language change, from which I infer the character, periodization and constraints of diachronic variation.
I present a three-stage probabilistic model of word order change, which also conforms to traditional language change patterns.
arXiv Detail & Related papers (2020-11-16T20:30:31Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - Where New Words Are Born: Distributional Semantic Analysis of Neologisms
and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm.
We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.