Automatic Scansion of Spanish Poetry without Syllabification
- URL: http://arxiv.org/abs/2012.12799v1
- Date: Wed, 23 Dec 2020 16:59:43 GMT
- Title: Automatic Scansion of Spanish Poetry without Syllabification
- Authors: Guillermo Marco Rem\'on, Julio Gonzalo
- Abstract summary: We propose an algorithm that performs accurate scansion (number of syllables, stress pattern and type of verse) without syllabification.
Our algorithm outperforms the current state of the art by 2% in fixed-metre poetry, and 25% in mixed-metre poetry.
- Score: 2.6143558180103326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, several systems of automated metric analysis of Spanish
poetry have emerged. These systems rely on complex methods of syllabification
and stress assignment, which use PoS-tagging libraries, whose computational
cost is high. This cost increases with the calculation of metric ambiguities.
Furthermore, they do not consider determining issues in syllabic count such as
the phenomena of compensation between hemistichs of verses of more than eleven
syllables. However, it is possible to carry out an informative and accurate
metric analysis without using these costly methods. We propose an algorithm
that performs accurate scansion (number of syllables, stress pattern and type
of verse) without syllabification. It addresses metric ambiguities and takes
into account the hemistichs compensation. Our algorithm outperforms the current
state of the art by 2% in fixed-metre poetry, and 25% in mixed-metre poetry. It
also runs 21 and 25 times faster, respectively. Finally, a desktop application
is offered as a tool for researchers of Spanish poetry.
Related papers
- Metronome: tracing variation in poetic meters via local sequence alignment [0.18749305679160366]
This paper introduces an unsupervised method for detecting structural similarities in poems using local sequence alignment.
The method relies on encoding poetic texts as strings of prosodic features using a four-letter alphabet.
These sequences are then aligned to derive a distance measure based on weighted symbol (mis)matches.
arXiv Detail & Related papers (2024-04-26T11:37:45Z) - ALBERTI, a Multilingual Domain Specific Language Model for Poetry
Analysis [0.0]
We present textscAlberti, the first multilingual pre-trained large language model for poetry.
We further trained multilingual BERT on a corpus of over 12 million verses from 12 languages.
textscAlberti achieves state-of-the-art results for German when compared to rule-based systems.
arXiv Detail & Related papers (2023-07-03T22:50:53Z) - PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in
Poetry Generation [58.36105306993046]
Controllable text generation is a challenging and meaningful field in natural language generation (NLG)
In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry.
Our model outperforms existing models in automatic evaluation of semantic, metrical, and overall performance as well as human evaluation.
arXiv Detail & Related papers (2023-06-14T11:57:31Z) - The Glass Ceiling of Automatic Evaluation in Natural Language Generation [60.59732704936083]
We take a step back and analyze recent progress by comparing the body of existing automatic metrics and human metrics.
Our extensive statistical analysis reveals surprising findings: automatic metrics -- old and new -- are much more similar to each other than to humans.
arXiv Detail & Related papers (2022-08-31T01:13:46Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - Don't Go Far Off: An Empirical Study on Neural Poetry Translation [13.194404923699782]
We present an empirical investigation for poetry translation along several dimensions.
We contribute a parallel dataset of poetry translations for several language pairs.
Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size.
arXiv Detail & Related papers (2021-09-07T10:00:44Z) - CCPM: A Chinese Classical Poetry Matching Dataset [50.90794811956129]
We propose a novel task to assess a model's semantic understanding of poetry by poem matching.
This task requires the model to select one line of Chinese classical poetry among four candidates according to the modern Chinese translation of a line of poetry.
To construct this dataset, we first obtain a set of parallel data of Chinese classical poetry and modern Chinese translation.
arXiv Detail & Related papers (2021-06-03T16:49:03Z) - Automatic Meter Classification of Kurdish Poems [3.0839245814393728]
Knowing the meter of the poems is helpful for correct reading, a better understanding of the meaning, and avoidance of ambiguity.
This paper presents a rule-based method for automatic classification of the poem meter for the Central Kurdish language.
arXiv Detail & Related papers (2021-02-24T07:57:38Z) - My Teacher Thinks The World Is Flat! Interpreting Automatic Essay
Scoring Mechanism [71.34160809068996]
Recent work shows that automated scoring systems are prone to even common-sense adversarial samples.
We utilize recent advances in interpretability to find the extent to which features such as coherence, content and relevance are important for automated scoring mechanisms.
We also find that since the models are not semantically grounded with world-knowledge and common sense, adding false facts such as the world is flat'' actually increases the score instead of decreasing it.
arXiv Detail & Related papers (2020-12-27T06:19:20Z) - Syllabification of the Divine Comedy [0.0]
We provide a syllabification algorithm for the Divine Comedy using techniques from probabilistic and constraint programming.
We particularly focus on the synalephe, addressed in terms of the "propensity" of a word to take part in a synalephe with adjacent words.
We jointly provide an online vocabulary containing, for each word, information about its syllabification, the location of the tonic accent, and the aforementioned synalephe propensity.
arXiv Detail & Related papers (2020-10-26T12:14:14Z) - Writer Identification Using Microblogging Texts for Social Media
Forensics [53.180678723280145]
We evaluate popular stylometric features, widely used in literary analysis, and specific Twitter features like URLs, hashtags, replies or quotes.
We test varying sized author sets and varying amounts of training/test texts per author.
arXiv Detail & Related papers (2020-07-31T00:23:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.