Pay Attention to Your Tone: Introducing a New Dataset for Polite
Language Rewrite
- URL: http://arxiv.org/abs/2212.10190v1
- Date: Tue, 20 Dec 2022 12:02:34 GMT
- Title: Pay Attention to Your Tone: Introducing a New Dataset for Polite
Language Rewrite
- Authors: Xun Wang, Tao Ge, Allen Mao, Yuki Li, Furu Wei, Si-Qing Chen
- Abstract summary: We introduce textscPoliteRewrite -- a dataset for polite language rewrite.
TenK polite sentence rewrites annotated collaboratively by GPT-3.5 and human.
100K high-quality polite sentence rewrites by GPT-3.5 without human review.
- Score: 81.83910117028464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce \textsc{PoliteRewrite} -- a dataset for polite language rewrite
which is a novel sentence rewrite task. Compared with previous text style
transfer tasks that can be mostly addressed by slight token- or phrase-level
edits, polite language rewrite requires deep understanding and extensive
sentence-level edits over an offensive and impolite sentence to deliver the
same message euphemistically and politely, which is more challenging -- not
only for NLP models but also for human annotators to rewrite with effort. To
alleviate the human effort for efficient annotation, we first propose a novel
annotation paradigm by a collaboration of human annotators and GPT-3.5 to
annotate \textsc{PoliteRewrite}. The released dataset has 10K polite sentence
rewrites annotated collaboratively by GPT-3.5 and human, which can be used as
gold standard for training, validation and test; and 100K high-quality polite
sentence rewrites by GPT-3.5 without human review. We wish this work (The
dataset (10K+100K) will be released soon) could contribute to the research on
more challenging sentence rewrite, and provoke more thought in future on
resource annotation paradigm with the help of the large-scaled pretrained
models.
Related papers
- Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers [66.55612528039894]
AdaQR is a framework for training query rewriting models with limited rewrite annotations from seed datasets and completely no passage label.
A novel approach is proposed to assess retriever's preference for these candidates by the probability of answers conditioned on the conversational query.
arXiv Detail & Related papers (2024-06-16T16:09:05Z) - RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting [11.306772273707253]
Large Language Models (LLMs) have demonstrated impressive capabilities in creative tasks such as storytelling and E-mail generation.
We develop new strategies for instruction tuning and reinforcement learning to better align LLMs for cross-sentence rewriting tasks.
OpenRewriteEval, a novel benchmark covers a wide variety of rewriting types expressed through natural language instructions.
arXiv Detail & Related papers (2023-05-25T03:26:26Z) - UPTON: Preventing Authorship Leakage from Public Text Release via Data
Poisoning [17.956089294338984]
We present a novel solution, UPTON, that exploits black-box data poisoning methods to weaken the authorship features in training samples.
We present empirical validation where UPTON successfully downgrades the accuracy of AA models to the impractical level.
UPTON remains effective to AA models that are already trained on available clean writings of authors.
arXiv Detail & Related papers (2022-11-17T17:49:57Z) - Read, Revise, Repeat: A System Demonstration for Human-in-the-loop
Iterative Text Revision [11.495407637511878]
We present a human-in-the-loop iterative text revision system, Read, Revise, Repeat (R3)
R3 aims at achieving high quality text revisions with minimal human efforts by reading model-generated revisions and user feedbacks, revising documents, and repeating human-machine interactions.
arXiv Detail & Related papers (2022-04-07T18:33:10Z) - Preventing Author Profiling through Zero-Shot Multilingual
Back-Translation [15.871735427038386]
We propose a simple, zero-shot way to effectively lower the risk of author profiling through multilingual back-translation.
Results from both an automatic and a human evaluation show that our approach achieves the best overall performance.
We are able to lower the adversarial prediction of gender and race by up to $22%$ while retaining $95%$ of the original utility on downstream tasks.
arXiv Detail & Related papers (2021-09-19T14:36:22Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - Consecutive Decoding for Speech-to-text Translation [51.155661276936044]
COnSecutive Transcription and Translation (COSTT) is an integral approach for speech-to-text translation.
The key idea is to generate source transcript and target translation text with a single decoder.
Our method is verified on three mainstream datasets.
arXiv Detail & Related papers (2020-09-21T10:10:45Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Politeness Transfer: A Tag and Generate Approach [167.9924201435888]
This paper introduces a new task of politeness transfer.
It involves converting non-polite sentences to polite sentences while preserving the meaning.
We design a tag and generate pipeline that identifies stylistic attributes and subsequently generates a sentence in the target style.
arXiv Detail & Related papers (2020-04-29T15:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.