DISCO PAL: Diachronic Spanish Sonnet Corpus with Psychological and
Affective Labels
- URL: http://arxiv.org/abs/2007.04626v3
- Date: Sun, 20 Jun 2021 19:19:56 GMT
- Title: DISCO PAL: Diachronic Spanish Sonnet Corpus with Psychological and
Affective Labels
- Authors: Alberto Barbado, V\'ictor Fresno, \'Angeles Manjarr\'es Riesco,
Salvador Ros
- Abstract summary: This article presents a study over an annotated corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words for predicting their GAM.
The corpus used contains 274 Spanish sonnets from authors of different centuries, from 15th to 19th.
Thanks to this, the corpus of sonnets can be used in different applications, such as poetry recommender systems, personality text mining studies of the authors, or the usage of poetry for therapeutic purposes.
- Score: 1.7205106391379026
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Nowadays, there are many applications of text mining over corpora from
different languages. However, most of them are based on texts in prose, lacking
applications that work with poetry texts. An example of an application of text
mining in poetry is the usage of features derived from their individual words
in order to capture the lexical, sublexical and interlexical meaning, and infer
the General Affective Meaning (GAM) of the text. However, even though this
proposal has been proved as useful for poetry in some languages, there is a
lack of studies for both Spanish poetry and for highly-structured poetic
compositions such as sonnets. This article presents a study over an annotated
corpus of Spanish sonnets, in order to analyse if it is possible to build
features from their individual words for predicting their GAM. The purpose of
this is to model sonnets at an affective level. The article also analyses the
relationship between the GAM of the sonnets and the content itself. For this,
we consider the content from a psychological perspective, identifying with tags
when a sonnet is related to a specific term. Then, we study how GAM changes
according to each of those psychological terms.
The corpus used contains 274 Spanish sonnets from authors of different
centuries, from 15th to 19th. This corpus was annotated by different domain
experts. The experts annotated the poems with affective and lexico-semantic
features, as well as with domain concepts that belong to psychology. Thanks to
this, the corpus of sonnets can be used in different applications, such as
poetry recommender systems, personality text mining studies of the authors, or
the usage of poetry for therapeutic purposes.
Related papers
- Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets [3.0040661953201475]
Large language models (LLMs) can now generate and recognize poetry.
We develop a task to evaluate how well LLMs recognize one aspect of English-language poetry.
We show that state-of-the-art LLMs can successfully identify both common and uncommon fixed poetic forms.
arXiv Detail & Related papers (2024-06-27T05:36:53Z) - A Computational Approach to Style in American Poetry [19.41186389974801]
We develop a method to assess the style of American poems and to visualize a collection of poems in relation to one another.
qualitative poetry criticism helped guide our development of metrics that analyze various orthographic, syntactic, and phonemic features.
Our method has potential applications to academic research of texts, to research of the intuitive personal response to poetry, and to making recommendations to readers based on their favorite poems.
arXiv Detail & Related papers (2023-10-13T18:49:14Z) - PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in
Poetry Generation [58.36105306993046]
Controllable text generation is a challenging and meaningful field in natural language generation (NLG)
In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry.
Our model outperforms existing models in automatic evaluation of semantic, metrical, and overall performance as well as human evaluation.
arXiv Detail & Related papers (2023-06-14T11:57:31Z) - Information-Restricted Neural Language Models Reveal Different Brain
Regions' Sensitivity to Semantics, Syntax and Context [87.31930367845125]
We trained a lexical language model, Glove, and a supra-lexical language model, GPT-2, on a text corpus.
We then assessed to what extent these information-restricted models were able to predict the time-courses of fMRI signal of humans listening to naturalistic text.
Our analyses show that, while most brain regions involved in language are sensitive to both syntactic and semantic variables, the relative magnitudes of these effects vary a lot across these regions.
arXiv Detail & Related papers (2023-02-28T08:16:18Z) - Textless Speech Emotion Conversion using Decomposed and Discrete
Representations [49.55101900501656]
We decompose speech into discrete and disentangled learned representations, consisting of content units, F0, speaker, and emotion.
First, we modify the speech content by translating the content units to a target emotion, and then predict the prosodic features based on these units.
Finally, the speech waveform is generated by feeding the predicted representations into a neural vocoder.
arXiv Detail & Related papers (2021-11-14T18:16:42Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - Tortured phrases: A dubious writing style emerging in science. Evidence
of critical issues affecting established journals [69.76097138157816]
Probabilistic text generators have been used to produce fake scientific papers for more than a decade.
Complex AI-powered generation techniques produce texts indistinguishable from that of humans.
Some websites offer to rewrite texts for free, generating gobbledegook full of tortured phrases.
arXiv Detail & Related papers (2021-07-12T20:47:08Z) - CCPM: A Chinese Classical Poetry Matching Dataset [50.90794811956129]
We propose a novel task to assess a model's semantic understanding of poetry by poem matching.
This task requires the model to select one line of Chinese classical poetry among four candidates according to the modern Chinese translation of a line of poetry.
To construct this dataset, we first obtain a set of parallel data of Chinese classical poetry and modern Chinese translation.
arXiv Detail & Related papers (2021-06-03T16:49:03Z) - Metrical Tagging in the Wild: Building and Annotating Poetry Corpora
with Rhythmic Features [0.0]
We provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models.
We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches.
arXiv Detail & Related papers (2021-02-17T16:38:57Z) - Quasi Error-free Text Classification and Authorship Recognition in a
large Corpus of English Literature based on a Novel Feature Set [0.0]
We show that in the entire GLEC quasi error-free text classification and authorship recognition is possible with a method using the same set of five style and five content features.
Our data pave the way for many future computational and empirical studies of literature or experiments in reading psychology.
arXiv Detail & Related papers (2020-10-21T07:39:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.