Loci Similes: A Benchmark for Extracting Intertextualities in Latin Literature
- URL: http://arxiv.org/abs/2601.07533v1
- Date: Mon, 12 Jan 2026 13:34:49 GMT
- Title: Loci Similes: A Benchmark for Extracting Intertextualities in Latin Literature
- Authors: Julian Schelb, Michael Wittweiler, Marie Revellio, Barbara Feichtinger, Andreas Spitz,
- Abstract summary: Loci Similes is a benchmark for Latin intertextuality detection comprising of a curated dataset of 172k text segments containing 545 expert-verified parallels linking Late Antique authors to a corpus of classical authors.<n>We establish baselines for retrieval and classification of intertextualities with state-of-the-art LLMs.
- Score: 4.132158161225706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tracing connections between historical texts is an important part of intertextual research, enabling scholars to reconstruct the virtual library of a writer and identify the sources influencing their creative process. These intertextual links manifest in diverse forms, ranging from direct verbatim quotations to subtle allusions and paraphrases disguised by morphological variation. Language models offer a promising path forward due to their capability of capturing semantic similarity beyond lexical overlap. However, the development of new methods for this task is held back by the scarcity of standardized benchmarks and easy-to-use datasets. We address this gap by introducing Loci Similes, a benchmark for Latin intertextuality detection comprising of a curated dataset of ~172k text segments containing 545 expert-verified parallels linking Late Antique authors to a corpus of classical authors. Using this data, we establish baselines for retrieval and classification of intertextualities with state-of-the-art LLMs.
Related papers
- Multilingual corpora for the study of new concepts in the social sciences and humanities: [0.0]
This article presents a hybrid methodology for building a multilingual corpus designed to support the study of emerging concepts in the humanities and social sciences.<n>The corpus relies on two complementary sources: (1) textual content automatically extracted from company websites, cleaned for French and English, and (2) annual reports collected and automatically filtered according to documentary criteria (year, format, duplication)<n>The processing pipeline includes automatic language detection, filtering of non-relevant content, extraction of relevant segments, and enrichment with structural metadata.
arXiv Detail & Related papers (2025-12-08T10:04:50Z) - StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis [18.44456241158174]
StyleDecipher is a robust and explainable detection framework.<n>It revisits text detection using combined feature extractors to quantify stylistic differences.<n>It consistently achieves state-of-the-art in-domain accuracy.
arXiv Detail & Related papers (2025-10-14T15:07:27Z) - Mining Asymmetric Intertextuality [0.0]
Asymmetric intertextuality refers to one-sided relationships between texts.
We propose a scalable and adaptive approach for mining asymmetric intertextuality.
Our system handles intertextuality at various levels, from direct quotations to paraphrasing and cross-document influence.
arXiv Detail & Related papers (2024-10-19T16:12:22Z) - Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis [0.0]
The study demonstrates that large language models can detect direct quotations, allusions, and echoes between texts.
The model struggles with long query passages and the inclusion of false intertextual dependences.
The expert-in-the-loop methodology presented offers a scalable approach for intertextual research.
arXiv Detail & Related papers (2024-09-03T13:23:11Z) - Description-Based Text Similarity [59.552704474862004]
We identify the need to search for texts based on abstract descriptions of their content.
We propose an alternative model that significantly improves when used in standard nearest neighbor search.
arXiv Detail & Related papers (2023-05-21T17:14:31Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Latin writing styles analysis with Machine Learning: New approach to old
questions [0.0]
In the Middle Ages texts were learned by heart and spread using oral means of communication from generation to generation.
Taking into account such a specific construction of literature composed in Latin, we can search for and indicate the probability patterns of familiar sources of specific narrative texts.
arXiv Detail & Related papers (2021-09-01T20:21:45Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations.
Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.