Related papers: Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

URL: http://arxiv.org/abs/2110.14203v1
Date: Wed, 27 Oct 2021 06:25:31 GMT
Title: Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution
Authors: Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani
Abstract summary: We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
Score: 74.27826764855911
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, i.e., on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.

Related papers

Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs [50.0874045899661]
We introduce CharacterBot, a model designed to replicate both the linguistic patterns and distinctive thought processes of a character. Using Lu Xun as a case study, we propose four training tasks derived from his 17 essay collections. These include a pre-training task focused on mastering external linguistic structures and knowledge, as well as three fine-tuning tasks. We evaluate CharacterBot on three tasks for linguistic accuracy and opinion comprehension, demonstrating that it significantly outperforms the baselines on our adapted metrics.
arXiv Detail & Related papers (2025-02-18T16:11:54Z)
Story Grammar Semantic Matching for Literary Study [3.686808512438363]
We propose a more transparent approach that makes use of story structure and related elements. We label prose and epic poetry with story element labels and perform semantic matching by only considering these labels as features. This new method, Story Grammar Semantic Matching, guides literary scholars to allusions and other semantic similarities across texts in a way that allows for characterizing patterns and literary technique.
arXiv Detail & Related papers (2025-02-17T19:20:39Z)
Estimating the Influence of Sequentially Correlated Literary Properties in Textual Classification: A Data-Centric Hypothesis-Testing Approach [4.161155428666988]
Stylometry aims to distinguish authors by analyzing literary traits assumed to reflect semi-conscious choices distinct from elements like genre or theme. While some literary properties, such as thematic content, are likely to manifest as correlations between adjacent text units, others, like authorial style, may be independent thereof. We introduce a hypothesis-testing approach to evaluate the influence of sequentially correlated literary properties on text classification.
arXiv Detail & Related papers (2024-11-07T18:28:40Z)
BookWorm: A Dataset for Character Description and Analysis [59.186325346763184]
We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks.
arXiv Detail & Related papers (2024-10-14T10:55:58Z)
Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text [4.863310073296471]
We propose 2SDiac, a multi-source model that can effectively support optional diacritics in input to inform all predictions. We also introduce Guided Learning, a training scheme to leverage given diacritics in input with different levels of random masking.
arXiv Detail & Related papers (2023-06-06T10:18:17Z)
DeltaScore: Fine-Grained Story Evaluation with Perturbations [69.33536214124878]
We introduce DELTASCORE, a novel methodology that employs perturbation techniques for the evaluation of nuanced story aspects. Our central proposition posits that the extent to which a story excels in a specific aspect (e.g., fluency) correlates with the magnitude of its susceptibility to particular perturbations. We measure the quality of an aspect by calculating the likelihood difference between pre- and post-perturbation states using pre-trained language models.
arXiv Detail & Related papers (2023-03-15T23:45:54Z)
A pattern recognition approach for distinguishing between prose and poetry [0.8971132850029492]
We propose an automated method to distinguish between poetry and prose based solely on aural and rhythmic properties. The classification of the considered texts using the set of features extracted resulted in a best accuracy of 0.78, obtained with a neural network.
arXiv Detail & Related papers (2021-07-18T18:44:17Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
Decomposing lexical and compositional syntax and semantics with deep language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension. Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations. The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z)
Metrical Tagging in the Wild: Building and Annotating Poetry Corpora with Rhythmic Features [0.0]
We provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models. We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches.
arXiv Detail & Related papers (2021-02-17T16:38:57Z)
Quasi Error-free Text Classification and Authorship Recognition in a large Corpus of English Literature based on a Novel Feature Set [0.0]
We show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features. Our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.
arXiv Detail & Related papers (2020-10-21T07:39:55Z)
The Secret is in the Spectra: Predicting Cross-lingual Task Performance with Spectral Similarity Measures [83.53361353172261]
We present a large-scale study focused on the correlations between monolingual embedding space similarity and task performance. We introduce several isomorphism measures between two embedding spaces, based on the relevant statistics of their individual spectra. We empirically show that 1) language similarity scores derived from such spectral isomorphism measures are strongly associated with performance observed in different cross-lingual tasks.
arXiv Detail & Related papers (2020-01-30T00:09:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.