Graecia capta ferum victorem cepit. Detecting Latin Allusions to Ancient
Greek Literature
- URL: http://arxiv.org/abs/2308.12008v1
- Date: Wed, 23 Aug 2023 08:54:05 GMT
- Title: Graecia capta ferum victorem cepit. Detecting Latin Allusions to Ancient
Greek Literature
- Authors: Frederick Riemenschneider and Anette Frank
- Abstract summary: We introduce SPhilBERTa, a trilingual Sentence-RoBERTa model tailored for Classical Philology.
It excels at cross-lingual semantic comprehension and identification of identical sentences across Ancient Greek, Latin, and English.
We generate new training data by automatically translating English texts into Ancient Greek.
- Score: 23.786649328915097
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intertextual allusions hold a pivotal role in Classical Philology, with Latin
authors frequently referencing Ancient Greek texts. Until now, the automatic
identification of these intertextual references has been constrained to
monolingual approaches, seeking parallels solely within Latin or Greek texts.
In this study, we introduce SPhilBERTa, a trilingual Sentence-RoBERTa model
tailored for Classical Philology, which excels at cross-lingual semantic
comprehension and identification of identical sentences across Ancient Greek,
Latin, and English. We generate new training data by automatically translating
English texts into Ancient Greek. Further, we present a case study,
demonstrating SPhilBERTa's capability to facilitate automated detection of
intertextual parallels. Our models and resources are available at
https://github.com/Heidelberg-NLP/ancient-language-models.
Related papers
- Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Sentence Embedding Models for Ancient Greek Using Multilingual Knowledge
Distillation [0.0]
We use a multilingual knowledge distillation approach to train BERT models to produce sentence embeddings for Ancient Greek text.
We build a parallel sentence dataset using a sentence-embedding alignment method to align Ancient Greek documents with English translations.
We evaluate our models on translation search, semantic similarity, and semantic retrieval tasks.
arXiv Detail & Related papers (2023-08-24T23:38:44Z) - Exploring Large Language Models for Classical Philology [17.856304057963776]
We create four language models for Ancient Greek that vary along two dimensions to study their versatility for tasks of interest for Classical languages.
We evaluate all models on morphological and syntactic tasks, including lemmatization.
Results show that our models provide significant improvements over the SoTA.
arXiv Detail & Related papers (2023-05-23T05:21:02Z) - Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language.
The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German.
We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z) - The interplay between morphological typology and script on a novel
multi-layer Algerian dialect corpus [4.974890682815778]
We introduce a newly annotated corpus of Algerian user-generated comments comprising parallel annotations of Algerian written in Latin, Arabic, and code-switched scripts.
We find there is a delicate relationship between script and typology for part-of-speech, while sentiment analysis is less sensitive.
arXiv Detail & Related papers (2021-05-16T10:22:21Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - Latin BERT: A Contextual Language Model for Classical Philology [7.513100214864645]
We present Latin BERT, a contextual language model for the Latin language.
It was trained on 642.7 million words from a variety of sources spanning the Classical era to the 21st century.
arXiv Detail & Related papers (2020-09-21T17:47:44Z) - Constructing a Family Tree of Ten Indo-European Languages with
Delexicalized Cross-linguistic Transfer Patterns [57.86480614673034]
We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns.
This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of Second Language Acquisition.
arXiv Detail & Related papers (2020-07-17T15:56:54Z) - Phonetic and Visual Priors for Decipherment of Informal Romanization [37.77170643560608]
We propose a noisy-channel WFST cascade model for deciphering the original non-Latin script from observed romanized text.
We train our model directly on romanized data from two languages: Egyptian Arabic and Russian.
We demonstrate that adding inductive bias through phonetic and visual priors on character mappings substantially improves the model's performance on both languages.
arXiv Detail & Related papers (2020-05-05T21:57:27Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.