CCPM: A Chinese Classical Poetry Matching Dataset
- URL: http://arxiv.org/abs/2106.01979v1
- Date: Thu, 3 Jun 2021 16:49:03 GMT
- Title: CCPM: A Chinese Classical Poetry Matching Dataset
- Authors: Wenhao Li, Fanchao Qi, Maosong Sun, Xiaoyuan Yi, Jiarui Zhang
- Abstract summary: We propose a novel task to assess a model's semantic understanding of poetry by poem matching.
This task requires the model to select one line of Chinese classical poetry among four candidates according to the modern Chinese translation of a line of poetry.
To construct this dataset, we first obtain a set of parallel data of Chinese classical poetry and modern Chinese translation.
- Score: 50.90794811956129
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Poetry is one of the most important art forms of human languages. Recently
many studies have focused on incorporating some linguistic features of poetry,
such as style and sentiment, into its understanding or generation system.
However, there is no focus on understanding or evaluating the semantics of
poetry. Therefore, we propose a novel task to assess a model's semantic
understanding of poetry by poem matching. Specifically, this task requires the
model to select one line of Chinese classical poetry among four candidates
according to the modern Chinese translation of a line of poetry. To construct
this dataset, we first obtain a set of parallel data of Chinese classical
poetry and modern Chinese translation. Then we retrieve similar lines of poetry
with the lines in a poetry corpus as negative choices. We name the dataset
Chinese Classical Poetry Matching Dataset (CCPM) and release it at
https://github.com/THUNLP-AIPoet/CCPM. We hope this dataset can further enhance
the study on incorporating deep semantics into the understanding and generation
system of Chinese classical poetry. We also preliminarily run two variants of
BERT on this dataset as the baselines for this dataset.
Related papers
- Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets [3.0040661953201475]
Large language models (LLMs) can now generate and recognize poetry.
We develop a task to evaluate how well LLMs recognize one aspect of English-language poetry.
We show that state-of-the-art LLMs can successfully identify both common and uncommon fixed poetic forms.
arXiv Detail & Related papers (2024-06-27T05:36:53Z) - A Computational Approach to Style in American Poetry [19.41186389974801]
We develop a method to assess the style of American poems and to visualize a collection of poems in relation to one another.
qualitative poetry criticism helped guide our development of metrics that analyze various orthographic, syntactic, and phonemic features.
Our method has potential applications to academic research of texts, to research of the intuitive personal response to poetry, and to making recommendations to readers based on their favorite poems.
arXiv Detail & Related papers (2023-10-13T18:49:14Z) - Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep
Learning Approaches [7.021140304091526]
This paper introduces a framework called textitAshaar, which encompasses a collection of datasets and pre-trained models designed specifically for the analysis and generation of Arabic poetry.
The pipeline established within our proposed approach encompasses various aspects of poetry, such as meter, theme, and era classification.
As part of this endeavor, we provide four datasets: one for poetry generation, another for diacritization, and two for Arudi-style prediction.
arXiv Detail & Related papers (2023-07-12T15:07:16Z) - PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in
Poetry Generation [58.36105306993046]
Controllable text generation is a challenging and meaningful field in natural language generation (NLG)
In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry.
Our model outperforms existing models in automatic evaluation of semantic, metrical, and overall performance as well as human evaluation.
arXiv Detail & Related papers (2023-06-14T11:57:31Z) - A Method to Judge the Style of Classical Poetry Based on Pre-trained
Model [13.899056358137287]
This paper builds the most perfect data set of Chinese classical poetry at present, trains a BART-poem pre-trained model on this data set, and puts forward a generally applicable poetry style judgment method.
Experiments show that the judgment results of the tested poetry work are basically consistent with the conclusions given by critics of previous dynasties, verify some avant-garde judgments of Mr. Qian Zhongshu, and better solve the task of poetry style recognition in the Tang and Song dynasties.
arXiv Detail & Related papers (2022-11-09T03:11:15Z) - PoeLM: A Meter- and Rhyme-Controllable Language Model for Unsupervised
Poetry Generation [42.12348554537587]
Formal verse poetry imposes strict constraints on the meter and rhyme scheme of poems.
Most prior work on generating this type of poetry uses existing poems for supervision.
We propose an unsupervised approach to generate poems following any given meter and rhyme scheme.
arXiv Detail & Related papers (2022-05-24T17:09:55Z) - Don't Go Far Off: An Empirical Study on Neural Poetry Translation [13.194404923699782]
We present an empirical investigation for poetry translation along several dimensions.
We contribute a parallel dataset of poetry translations for several language pairs.
Our results show that multilingual fine-tuning on poetic text significantly outperforms multilingual fine-tuning on non-poetic text that is 35X larger in size.
arXiv Detail & Related papers (2021-09-07T10:00:44Z) - Lingxi: A Diversity-aware Chinese Modern Poetry Generation System [43.36560720793425]
Lingxi is a diversity-aware Chinese modern poetry generation system.
We propose nucleus sampling with randomized head (NS-RH) algorithm.
We find that even when a large portion of filtered vocabulary is randomized, it can actually generate fluent poetry.
arXiv Detail & Related papers (2021-08-27T03:33:28Z) - Generating Major Types of Chinese Classical Poetry in a Uniformed
Framework [88.57587722069239]
We propose a GPT-2 based framework for generating major types of Chinese classical poems.
Preliminary results show this enhanced model can generate Chinese classical poems of major types with high quality in both form and content.
arXiv Detail & Related papers (2020-03-13T14:16:25Z) - MixPoet: Diverse Poetry Generation via Learning Controllable Mixed
Latent Space [79.70053419040902]
We propose MixPoet, a novel model that absorbs multiple factors to create various styles and promote diversity.
Based on a semi-supervised variational autoencoder, our model disentangles the latent space into some subspaces, with each conditioned on one influence factor by adversarial training.
Experiment results on Chinese poetry demonstrate that MixPoet improves both diversity and quality against three state-of-the-art models.
arXiv Detail & Related papers (2020-03-13T03:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.