Writing Polishment with Simile: Task, Dataset and A Neural Approach
- URL: http://arxiv.org/abs/2012.08117v1
- Date: Tue, 15 Dec 2020 06:39:54 GMT
- Title: Writing Polishment with Simile: Task, Dataset and A Neural Approach
- Authors: Jiayi Zhang, Zhi Cui, Xiaoqiang Xia, Yalong Guo, Yanran Li, Chen Wei,
Jianwei Cui
- Abstract summary: We propose a new task of Writing Polishment with Simile (WPS) to investigate whether machines are able to polish texts with similes as we human do.
Our model firstly locates where the simile should happen, and then generates a location-specific simile.
We also release a large-scale Chinese Simile dataset containing 5 million similes with context.
- Score: 9.38000305423665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A simile is a figure of speech that directly makes a comparison, showing
similarities between two different things, e.g. "Reading papers can be dull
sometimes,like watching grass grow". Human writers often interpolate
appropriate similes into proper locations of the plain text to vivify their
writings. However, none of existing work has explored neural simile
interpolation, including both locating and generation. In this paper, we
propose a new task of Writing Polishment with Simile (WPS) to investigate
whether machines are able to polish texts with similes as we human do.
Accordingly, we design a two-staged Locate&Gen model based on transformer
architecture. Our model firstly locates where the simile interpolation should
happen, and then generates a location-specific simile. We also release a
large-scale Chinese Simile (CS) dataset containing 5 million similes with
context. The experimental results demonstrate the feasibility of WPS task and
shed light on the future research directions towards better automatic text
polishment.
Related papers
- StoryAnalogy: Deriving Story-level Analogies from Large Language Models
to Unlock Analogical Understanding [72.38872974837462]
We evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus.
textscStory Analogy contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory.
We observe that the data in textscStory Analogy can improve the quality of analogy generation in large language models.
arXiv Detail & Related papers (2023-10-19T16:29:23Z) - I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset [26.42431190718335]
A simile is a figure of speech that compares two different things (called the tenor and the vehicle) via shared properties.
The current simile research usually focuses on similes in a triplet (tenor, property, vehicle) or a single sentence.
We propose a novel and high-quality multilingual simile dialogue (MSD) dataset to facilitate the study of complex simile phenomena.
arXiv Detail & Related papers (2023-06-09T05:04:13Z) - Learning to Imagine: Visually-Augmented Natural Language Generation [73.65760028876943]
We propose a method to make pre-trained language models (PLMs) Learn to Imagine for Visuallyaugmented natural language gEneration.
We use a diffusion model to synthesize high-quality images conditioned on the input texts.
We conduct synthesis for each sentence rather than generate only one image for an entire paragraph.
arXiv Detail & Related papers (2023-05-26T13:59:45Z) - GENIUS: Sketch-based Language Model Pre-training via Extreme and
Selective Masking for Text Generation and Augmentation [76.7772833556714]
We introduce GENIUS: a conditional text generation model using sketches as input.
GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective.
We show that GENIUS can be used as a strong and ready-to-use data augmentation tool for various natural language processing (NLP) tasks.
arXiv Detail & Related papers (2022-11-18T16:39:45Z) - Can Pre-trained Language Models Interpret Similes as Smart as Human? [15.077252268027548]
We design a novel task named Simile Property Probing to let pre-trained language models infer the shared properties of similes.
Our empirical study shows that PLMs can infer similes' shared properties while still underperforming humans.
To bridge the gap with human performance, we additionally design a knowledge-enhanced training objective by incorporating the simile knowledge into PLMs.
arXiv Detail & Related papers (2022-03-16T07:57:34Z) - How much do language models copy from their training data? Evaluating
linguistic novelty in text generation using RAVEN [63.79300884115027]
Current language models can generate high-quality text.
Are they simply copying text they have seen before, or have they learned generalizable linguistic abstractions?
We introduce RAVEN, a suite of analyses for assessing the novelty of generated text.
arXiv Detail & Related papers (2021-11-18T04:07:09Z) - Aligning Cross-lingual Sentence Representations with Dual Momentum
Contrast [12.691501386854094]
We propose to align sentence representations from different languages into a unified embedding space, where semantic similarities can be computed with a simple dot product.
As the experimental results show, the sentence representations produced by our model achieve the new state-of-the-art on several tasks.
arXiv Detail & Related papers (2021-09-01T08:48:34Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Generating similes effortlessly like a Pro: A Style Transfer Approach
for Simile Generation [65.22565071742528]
Figurative language such as a simile go beyond plain expressions to give readers new insights and inspirations.
Generating a simile requires proper understanding for effective mapping of properties between two concepts.
We show how replacing literal sentences with similes from our best model in machine generated stories improves evocativeness and leads to better acceptance by human judges.
arXiv Detail & Related papers (2020-09-18T17:37:13Z) - Comparative Analysis of N-gram Text Representation on Igbo Text Document
Similarity [0.0]
The improvement in Information Technology has encouraged the use of Igbo in the creation of text such as resources and news articles online.
It adopted Euclidean similarity measure to determine the similarities between Igbo text documents represented with two word-based n-gram text representation (unigram and bigram) models.
arXiv Detail & Related papers (2020-04-01T12:24:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.