PhiloBERTA: A Transformer-Based Cross-Lingual Analysis of Greek and Latin Lexicons
- URL: http://arxiv.org/abs/2503.05265v1
- Date: Fri, 07 Mar 2025 09:30:16 GMT
- Title: PhiloBERTA: A Transformer-Based Cross-Lingual Analysis of Greek and Latin Lexicons
- Authors: Rumi A. Allbert, Makai L. Allbert,
- Abstract summary: We present PhiloBERTA, a model that measures semantic relationships between ancient Greek and Latin lexicons.<n>Our results show that etymologically related pairs demonstrate significantly higher similarity scores.<n>These findings establish a quantitative framework for examining how philosophical concepts moved between Greek and Latin traditions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present PhiloBERTA, a cross-lingual transformer model that measures semantic relationships between ancient Greek and Latin lexicons. Through analysis of selected term pairs from classical texts, we use contextual embeddings and angular similarity metrics to identify precise semantic alignments. Our results show that etymologically related pairs demonstrate significantly higher similarity scores, particularly for abstract philosophical concepts such as epist\=em\=e (scientia) and dikaiosyn\=e (iustitia). Statistical analysis reveals consistent patterns in these relationships (p = 0.012), with etymologically related pairs showing remarkably stable semantic preservation compared to control pairs. These findings establish a quantitative framework for examining how philosophical concepts moved between Greek and Latin traditions, offering new methods for classical philological research.
Related papers
- A State-of-the-Art Morphosyntactic Parser and Lemmatizer for Ancient Greek [0.0]
This paper presents an experiment consisting in the comparison of six models to identify a state-of-the-art morphosyntactic and tizer for Ancient Greek texts.
A normalized version of the major collections of annotated texts was used to train the baseline model Dithrax with randomly character embeddings.
A Bayesian analysis shows that Dithrax and Trankit morphology annotate practically equivalently, while syntax is best annotated by Trankit and lemmata by GreTa.
arXiv Detail & Related papers (2024-10-15T20:49:48Z) - An Encoding of Abstract Dialectical Frameworks into Higher-Order Logic [57.24311218570012]
This approach allows for the computer-assisted analysis of abstract dialectical frameworks.
Exemplary applications include the formal analysis and verification of meta-theoretical properties.
arXiv Detail & Related papers (2023-12-08T09:32:26Z) - Graecia capta ferum victorem cepit. Detecting Latin Allusions to Ancient
Greek Literature [23.786649328915097]
We introduce SPhilBERTa, a trilingual Sentence-RoBERTa model tailored for Classical Philology.
It excels at cross-lingual semantic comprehension and identification of identical sentences across Ancient Greek, Latin, and English.
We generate new training data by automatically translating English texts into Ancient Greek.
arXiv Detail & Related papers (2023-08-23T08:54:05Z) - A Category-theoretical Meta-analysis of Definitions of Disentanglement [97.34033555407403]
Disentangling the factors of variation in data is a fundamental concept in machine learning.
This paper presents a meta-analysis of existing definitions of disentanglement.
arXiv Detail & Related papers (2023-05-11T15:24:20Z) - Computational valency lexica and Homeric formularity [1.6346069386394704]
We present AGVaLex, a lexicon for ancient Greek automatically extracted from the Ancient Greek Dependency Treebank.
It contains quantitative corpus-driven morphological, syntactic and lexical information about verbs and their arguments.
It has a wide range of applications for the study of the language of ancient Greek authors.
arXiv Detail & Related papers (2022-08-23T08:03:16Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - Patterns of Lexical Ambiguity in Contextualised Language Models [9.747449805791092]
We introduce an extended, human-annotated dataset of graded word sense similarity and co-predication.
Both types of human judgements indicate that the similarity of polysemic interpretations falls in a continuum between identity of meaning and homonymy.
Our dataset appears to capture a substantial part of the complexity of lexical ambiguity, and can provide a realistic test bed for contextualised embeddings.
arXiv Detail & Related papers (2021-09-27T13:11:44Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z) - High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model.
It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs.
Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - Semantic Relatedness and Taxonomic Word Embeddings [2.47944699884651]
We show that there are different types of semantic relatedness and that different lexical representations encode different forms of relatedness.
We present experiments that analyse taxonomic embeddings that have been trained on a synthetic corpus that has been generated via a random walk over a taxonomy.
We explore the interactions between the relative sizes of natural and synthetic corpora on the performance of embeddings when taxonomic and thematic embeddings are combined.
arXiv Detail & Related papers (2020-02-14T20:02:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.