A digital perspective on the role of a stemma in material-philological transmission studies
- URL: http://arxiv.org/abs/2505.06938v1
- Date: Sun, 11 May 2025 11:05:16 GMT
- Title: A digital perspective on the role of a stemma in material-philological transmission studies
- Authors: Katarzyna Anna Kapitan,
- Abstract summary: Using the Old Norse saga of Hr'omundur as a case study, this article demonstrates that stemmas can serve as a starting point for exploring textual traditions further.<n>The article is accompanied by datasets used to generate stemmas for the Hr'omundar saga tradition as well as two custom Python scripts.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Taking its point of departure in the recent developments in the field of digital humanities and the increasing automatisation of scholarly workflows, this study explores the implications of digital approaches to textual traditions for the broader field of textual scholarship. It argues that the relative simplicity of creating computergenerated stemmas allows us to view the stemma codicum as a research tool rather than the final product of our scholarly investigation. Using the Old Norse saga of Hr\'omundur as a case study, this article demonstrates that stemmas can serve as a starting point for exploring textual traditions further. In doing so, they enable us to address research questions that otherwise remain unanswered. The article is accompanied by datasets used to generate stemmas for the Hr\'omundar saga tradition as well as two custom Python scripts. The scripts are designed to convert XML-based textual data, encoded according to the TEI Guidelines, into the input format used for the analysis in the PHYLIP package to generate unrooted trees of relationships between texts.
Related papers
- ParsiPy: NLP Toolkit for Historical Persian Texts in Python [1.637832760977605]
This work introduces ParsiPy, an NLP toolkit to handle phonetic transcriptions and analyze ancient texts.<n>ParsiPy offers modules for tokenization, lemmatization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embedding.
arXiv Detail & Related papers (2025-03-22T16:21:29Z) - Curatr: A Platform for Semantic Analysis and Curation of Historical
Literary Texts [5.075506385456811]
This paper presents Curatr, an online platform for the exploration and curation of literature with machine learning-supported semantic search.
The platform combines neural word embeddings with expert domain knowledge to enable the generation of thematic lexicons.
arXiv Detail & Related papers (2023-06-13T15:15:31Z) - Semantic Similarity Measure of Natural Language Text through Machine
Learning and a Keyword-Aware Cross-Encoder-Ranking Summarizer -- A Case Study
Using UCGIS GIS&T Body of Knowledge [2.4909170697740968]
GIS&T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics.
This research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text.
It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications.
arXiv Detail & Related papers (2023-05-17T01:17:57Z) - The Learnable Typewriter: A Generative Approach to Text Analysis [17.355857281085164]
We present a generative document-specific approach to character analysis and recognition in text lines.
Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters.
arXiv Detail & Related papers (2023-02-03T11:17:59Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Latin writing styles analysis with Machine Learning: New approach to old
questions [0.0]
In the Middle Ages texts were learned by heart and spread using oral means of communication from generation to generation.
Taking into account such a specific construction of literature composed in Latin, we can search for and indicate the probability patterns of familiar sources of specific narrative texts.
arXiv Detail & Related papers (2021-09-01T20:21:45Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - A Survey of Knowledge-Enhanced Text Generation [81.24633231919137]
The goal of text generation is to make machines express in human language.
Various neural encoder-decoder models have been proposed to achieve the goal by learning to map input text to output text.
To address this issue, researchers have considered incorporating various forms of knowledge beyond the input text into the generation models.
arXiv Detail & Related papers (2020-10-09T06:46:46Z) - Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP)
In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.