Syllabification of the Divine Comedy
- URL: http://arxiv.org/abs/2010.13515v1
- Date: Mon, 26 Oct 2020 12:14:14 GMT
- Title: Syllabification of the Divine Comedy
- Authors: Andrea Asperti and Stefano Dal Bianco
- Abstract summary: We provide a syllabification algorithm for the Divine Comedy using techniques from probabilistic and constraint programming.
We particularly focus on the synalephe, addressed in terms of the "propensity" of a word to take part in a synalephe with adjacent words.
We jointly provide an online vocabulary containing, for each word, information about its syllabification, the location of the tonic accent, and the aforementioned synalephe propensity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a syllabification algorithm for the Divine Comedy using techniques
from probabilistic and constraint programming. We particularly focus on the
synalephe, addressed in terms of the "propensity" of a word to take part in a
synalephe with adjacent words. We jointly provide an online vocabulary
containing, for each word, information about its syllabification, the location
of the tonic accent, and the aforementioned synalephe propensity, on the left
and right sides. The algorithm is intrinsically nondeterministic, producing
different possible syllabifications for each verse, with different likelihoods;
metric constraints relative to accents on the 10th, 4th and 6th syllables are
used to further reduce the solution space. The most likely syllabification is
hence returned as output. We believe that this work could be a major milestone
for a lot of different investigations. From the point of view of digital
humanities it opens new perspectives on computer assisted analysis of digital
sources, comprising automated detection of anomalous and problematic cases,
metric clustering of verses and their categorization, or more foundational
investigations addressing e.g. the phonetic roles of consonants and vowels.
From the point of view of text processing and deep learning, information about
syllabification and the location of accents opens a wide range of exciting
perspectives, from the possibility of automatic learning syllabification of
words and verses, to the improvement of generative models, aware of metric
issues, and more respectful of the expected musicality.
Related papers
- Automating Sound Change Prediction for Phylogenetic Inference: A
Tukanoan Case Study [12.78027959820939]
We train a neural network on sound change data to predict intermediate sound change steps between historical protoforms and their modern descendants.
In our best experiments on Tukanoan languages, this method produces trees with a Generalized Quartet Distance of 0.12 from a tree that used expert annotations.
arXiv Detail & Related papers (2024-02-02T17:20:16Z) - Design and Implementation of a Tool for Extracting Uzbek Syllables [0.0]
Syllabification is a versatile linguistic tool with applications in linguistic research, language technology, education, and various fields.
We present a comprehensive approach to syllabification for the Uzbek language, including rule-based techniques and machine learning algorithms.
The results of our experiments show that both approaches achieved a high level of accuracy, exceeding 99%.
arXiv Detail & Related papers (2023-12-25T17:46:58Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and
Phonetic Domains for Speech Representation Learning [0.76146285961466]
We present a methodology for linguistic feature extraction, focusing on automatically syllabifying words in multiple languages.
In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification.
The system was built with open-source components and resources.
arXiv Detail & Related papers (2023-10-17T19:27:23Z) - Revisiting Syllables in Language Modelling and their Application on
Low-Resource Machine Translation [1.2617078020344619]
Syllables provide shorter sequences than characters, require less-specialised extracting rules than morphemes, and their segmentation is not impacted by the corpus size.
We first explore the potential of syllables for open-vocabulary language modelling in 21 languages.
We use rule-based syllabification methods for six languages and address the rest with hyphenation, which works as a syllabification proxy.
arXiv Detail & Related papers (2022-10-05T18:55:52Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship
Attribution [74.27826764855911]
We employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts.
Our experiments, carried out on three different datasets, using two different machine learning methods, show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
arXiv Detail & Related papers (2021-10-27T06:25:31Z) - Revisiting Neural Language Modelling with Syllables [3.198144010381572]
We reconsider syllables for an open-vocabulary generation task in 20 languages.
We use rule-based syllabification methods for five languages and address the rest with a hyphenation tool.
With a comparable perplexity, we show that syllables outperform characters, annotated morphemes and unsupervised subwords.
arXiv Detail & Related papers (2020-10-24T11:44:41Z) - Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model.
First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model.
Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z) - Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system.
One of the popular approach to cover OOVs is to use subword units rather then words.
In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.