Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution
- URL: http://arxiv.org/abs/2110.14203v1
- Date: Wed, 27 Oct 2021 06:25:31 GMT
- Title: Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution
- Authors: Silvia Corbara, Alejandro Moreo, Fabrizio Sebastiani
- Abstract summary: We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
- Score: 74.27826764855911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is well known that, within the Latin production of written text, peculiar
metric schemes were followed not only in poetic compositions, but also in many
prose works. Such metric patterns were based on so-called syllabic quantity,
i.e., on the length of the involved syllables, and there is substantial
evidence suggesting that certain authors had a preference for certain metric
patterns over others. In this research we investigate the possibility to employ
syllabic quantity as a base for deriving rhythmic features for the task of
computational authorship attribution of Latin prose texts. We test the impact
of these features on the authorship attribution task when combined with other
topic-agnostic features. Our experiments, carried out on three different
datasets, using two different machine learning methods, show that rhythmic
features based on syllabic quantity are beneficial in discriminating among
Latin prose authors.
Related papers
- Take the Hint: Improving Arabic Diacritization with
Partially-Diacritized Text [4.863310073296471]
We propose 2SDiac, a multi-source model that can effectively support optional diacritics in input to inform all predictions.
We also introduce Guided Learning, a training scheme to leverage given diacritics in input with different levels of random masking.
arXiv Detail & Related papers (2023-06-06T10:18:17Z) - DeltaScore: Fine-Grained Story Evaluation with Perturbations [69.33536214124878]
We introduce DELTASCORE, a novel methodology that employs perturbation techniques for the evaluation of nuanced story aspects.
Our central proposition posits that the extent to which a story excels in a specific aspect (e.g., fluency) correlates with the magnitude of its susceptibility to particular perturbations.
We measure the quality of an aspect by calculating the likelihood difference between pre- and post-perturbation states using pre-trained language models.
arXiv Detail & Related papers (2023-03-15T23:45:54Z) - A pattern recognition approach for distinguishing between prose and
poetry [0.8971132850029492]
We propose an automated method to distinguish between poetry and prose based solely on aural and rhythmic properties.
The classification of the considered texts using the set of features extracted resulted in a best accuracy of 0.78, obtained with a neural network.
arXiv Detail & Related papers (2021-07-18T18:44:17Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Decomposing lexical and compositional syntax and semantics with deep
language models [82.81964713263483]
The activations of language transformers like GPT2 have been shown to linearly map onto brain activity during speech comprehension.
Here, we propose a taxonomy to factorize the high-dimensional activations of language models into four classes: lexical, compositional, syntactic, and semantic representations.
The results highlight two findings. First, compositional representations recruit a more widespread cortical network than lexical ones, and encompass the bilateral temporal, parietal and prefrontal cortices.
arXiv Detail & Related papers (2021-03-02T10:24:05Z) - Metrical Tagging in the Wild: Building and Annotating Poetry Corpora
with Rhythmic Features [0.0]
We provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models.
We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches.
arXiv Detail & Related papers (2021-02-17T16:38:57Z) - Quasi Error-free Text Classification and Authorship Recognition in a
large Corpus of English Literature based on a Novel Feature Set [0.0]
We show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features.
Our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.
arXiv Detail & Related papers (2020-10-21T07:39:55Z) - Feature Selection on Noisy Twitter Short Text Messages for Language
Identification [0.0]
We apply different feature selection algorithms across various learning algorithms in order to analyze the effect of the algorithm.
The methodology focuses on the word level language identification using a novel dataset of 6903 tweets extracted from Twitter.
arXiv Detail & Related papers (2020-07-11T09:22:01Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z) - The Secret is in the Spectra: Predicting Cross-lingual Task Performance
with Spectral Similarity Measures [83.53361353172261]
We present a large-scale study focused on the correlations between monolingual embedding space similarity and task performance.
We introduce several isomorphism measures between two embedding spaces, based on the relevant statistics of their individual spectra.
We empirically show that 1) language similarity scores derived from such spectral isomorphism measures are strongly associated with performance observed in different cross-lingual tasks.
arXiv Detail & Related papers (2020-01-30T00:09:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.