ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR
Back-Translation
- URL: http://arxiv.org/abs/2305.16585v1
- Date: Fri, 26 May 2023 02:27:33 GMT
- Title: ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR
Back-Translation
- Authors: Kuan-Hao Huang, Varun Iyer, I-Hung Hsu, Anoop Kumar, Kai-Wei Chang,
Aram Galstyan
- Abstract summary: ParaAMR is a large-scale syntactically diverse paraphrase dataset created by abstract meaning representation back-translation.
We show that ParaAMR can be used to improve on three NLP tasks: learning sentence embeddings, syntactically controlled paraphrase generation, and data augmentation for few-shot learning.
- Score: 59.91139600152296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Paraphrase generation is a long-standing task in natural language processing
(NLP). Supervised paraphrase generation models, which rely on human-annotated
paraphrase pairs, are cost-inefficient and hard to scale up. On the other hand,
automatically annotated paraphrase pairs (e.g., by machine back-translation),
usually suffer from the lack of syntactic diversity -- the generated paraphrase
sentences are very similar to the source sentences in terms of syntax. In this
work, we present ParaAMR, a large-scale syntactically diverse paraphrase
dataset created by abstract meaning representation back-translation. Our
quantitative analysis, qualitative examples, and human evaluation demonstrate
that the paraphrases of ParaAMR are syntactically more diverse compared to
existing large-scale paraphrase datasets while preserving good semantic
similarity. In addition, we show that ParaAMR can be used to improve on three
NLP tasks: learning sentence embeddings, syntactically controlled paraphrase
generation, and data augmentation for few-shot learning. Our results thus
showcase the potential of ParaAMR for improving various NLP applications.
Related papers
- A Quality-based Syntactic Template Retriever for
Syntactically-controlled Paraphrase Generation [67.98367574025797]
Existing syntactically-controlled paraphrase generation models perform promisingly with human-annotated or well-chosen syntactic templates.
The prohibitive cost makes it unfeasible to manually design decent templates for every source sentence.
We propose a novel Quality-based Syntactic Template Retriever (QSTR) to retrieve templates based on the quality of the to-be-generated paraphrases.
arXiv Detail & Related papers (2023-10-20T03:55:39Z) - Unsupervised Syntactically Controlled Paraphrase Generation with
Abstract Meaning Representations [59.10748929158525]
Abstract Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation.
Our proposed model, AMR-enhanced Paraphrase Generator (AMRPG), encodes the AMR graph and the constituency parses the input sentence into two disentangled semantic and syntactic embeddings.
Experiments show that AMRPG generates more accurate syntactically controlled paraphrases, both quantitatively and qualitatively, compared to the existing unsupervised approaches.
arXiv Detail & Related papers (2022-11-02T04:58:38Z) - Hierarchical Sketch Induction for Paraphrase Generation [79.87892048285819]
We introduce Hierarchical Refinement Quantized Variational Autoencoders (HRQ-VAE), a method for learning decompositions of dense encodings.
We use HRQ-VAE to encode the syntactic form of an input sentence as a path through the hierarchy, allowing us to more easily predict syntactic sketches at test time.
arXiv Detail & Related papers (2022-03-07T15:28:36Z) - Towards Document-Level Paraphrase Generation with Sentence Rewriting and
Reordering [88.08581016329398]
We propose CoRPG (Coherence Relationship guided Paraphrase Generation) for document-level paraphrase generation.
We use graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence.
Our model can generate document paraphrase with more diversity and semantic preservation.
arXiv Detail & Related papers (2021-09-15T05:53:40Z) - Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to
Corpus Exploration [25.159601117722936]
We propose a contrastive fine-tuning objective that enables BERT to produce more powerful phrase embeddings.
Our approach relies on a dataset of diverse phrasal paraphrases, which is automatically generated using a paraphrase generation model.
As a case study, we show that Phrase-BERT embeddings can be easily integrated with a simple autoencoder to build a phrase-based neural topic model.
arXiv Detail & Related papers (2021-09-13T20:31:57Z) - Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase
Generation Approach [97.38622477085188]
We propose BTmPG (Back-Translation guided multi-round Paraphrase Generation) to improve diversity of paraphrase.
We evaluate BTmPG on two benchmark datasets.
arXiv Detail & Related papers (2021-09-04T13:12:01Z) - Generating Syntactically Controlled Paraphrases without Using Annotated
Parallel Pairs [37.808235216195484]
We show that it is possible to generate syntactically various paraphrases without the need for annotated paraphrase pairs.
We propose Syntactically controlled Paraphrase Generator (SynPG), an encoder-decoder based model that learns to disentangle the semantics and the syntax of a sentence.
arXiv Detail & Related papers (2021-01-26T06:13:52Z) - Paraphrase Generation as Zero-Shot Multilingual Translation:
Disentangling Semantic Similarity from Lexical and Syntactic Diversity [11.564158965143418]
We introduce a simple paraphrase generation algorithm which discourages the production of n-grams that are present in the input.
Our approach enables paraphrase generation in many languages from a single multilingual NMT model.
arXiv Detail & Related papers (2020-08-11T18:05:34Z) - A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase
Detection in Short Texts [1.6758573326215689]
We present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts.
Our model is both wide and deep and provides greater robustness across clean and noisy short texts.
arXiv Detail & Related papers (2019-12-27T12:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.