ConRPG: Paraphrase Generation using Contexts as Regularizer
- URL: http://arxiv.org/abs/2109.00363v1
- Date: Wed, 1 Sep 2021 12:57:30 GMT
- Title: ConRPG: Paraphrase Generation using Contexts as Regularizer
- Authors: Yuxian Meng, Xiang Ao, Qing He, Xiaofei Sun, Qinghong Han, Fei Wu,
Chun fan and Jiwei Li
- Abstract summary: A long-standing issue with paraphrase generation is how to obtain reliable supervision signals.
We propose an unsupervised paradigm for paraphrase generation based on the assumption that the probabilities of generating two sentences with the same meaning given the same context should be the same.
We propose a pipelined system which consists of paraphrase candidate generation based on contextual language models, candidate filtering using scoring functions, and paraphrase model training based on the selected candidates.
- Score: 31.967883219986362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A long-standing issue with paraphrase generation is how to obtain reliable
supervision signals. In this paper, we propose an unsupervised paradigm for
paraphrase generation based on the assumption that the probabilities of
generating two sentences with the same meaning given the same context should be
the same. Inspired by this fundamental idea, we propose a pipelined system
which consists of paraphrase candidate generation based on contextual language
models, candidate filtering using scoring functions, and paraphrase model
training based on the selected candidates. The proposed paradigm offers merits
over existing paraphrase generation methods: (1) using the context regularizer
on meanings, the model is able to generate massive amounts of high-quality
paraphrase pairs; and (2) using human-interpretable scoring functions to select
paraphrase pairs from candidates, the proposed framework provides a channel for
developers to intervene with the data generation process, leading to a more
controllable model. Experimental results across different tasks and datasets
demonstrate that the effectiveness of the proposed model in both supervised and
unsupervised setups.
Related papers
- One2set + Large Language Model: Best Partners for Keyphrase Generation [42.969689556605005]
Keyphrase generation (KPG) aims to automatically generate a collection of phrases representing the core concepts of a given document.
We introduce a generate-then-select framework decomposing KPG into two steps, where we adopt a one2set-based model as generator to produce candidates and then use an LLM as selector to select keyphrases from these candidates.
Our framework significantly surpasses state-of-the-art models, especially in absent keyphrase prediction.
arXiv Detail & Related papers (2024-10-04T13:31:09Z) - Unsupervised Syntactically Controlled Paraphrase Generation with
Abstract Meaning Representations [59.10748929158525]
Abstract Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation.
Our proposed model, AMR-enhanced Paraphrase Generator (AMRPG), encodes the AMR graph and the constituency parses the input sentence into two disentangled semantic and syntactic embeddings.
Experiments show that AMRPG generates more accurate syntactically controlled paraphrases, both quantitatively and qualitatively, compared to the existing unsupervised approaches.
arXiv Detail & Related papers (2022-11-02T04:58:38Z) - Learning to Selectively Learn for Weakly-supervised Paraphrase
Generation [81.65399115750054]
We propose a novel approach to generate high-quality paraphrases with weak supervision data.
Specifically, we tackle the weakly-supervised paraphrase generation problem by:.
obtaining abundant weakly-labeled parallel sentences via retrieval-based pseudo paraphrase expansion.
We demonstrate that our approach achieves significant improvements over existing unsupervised approaches, and is even comparable in performance with supervised state-of-the-arts.
arXiv Detail & Related papers (2021-09-25T23:31:13Z) - Towards Document-Level Paraphrase Generation with Sentence Rewriting and
Reordering [88.08581016329398]
We propose CoRPG (Coherence Relationship guided Paraphrase Generation) for document-level paraphrase generation.
We use graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence.
Our model can generate document paraphrase with more diversity and semantic preservation.
arXiv Detail & Related papers (2021-09-15T05:53:40Z) - Paraphrase Generation as Unsupervised Machine Translation [30.99150547499427]
We propose a new paradigm for paraphrase generation by treating the task as unsupervised machine translation (UMT)
The proposed paradigm first splits a large unlabeled corpus into multiple clusters, and trains multiple UMT models using pairs of these clusters.
Then based on the paraphrase pairs produced by these UMT models, a unified surrogate model can be trained to serve as the final Seq2Seq model to generate paraphrases.
arXiv Detail & Related papers (2021-09-07T09:08:58Z) - Sentence Similarity Based on Contexts [31.135984064747607]
The proposed framework is based on the core idea that the meaning of a sentence should be defined by its contexts.
It is able to generate high-quality, large-scale dataset with semantic similarity scores between two sentences in an unsupervised manner.
arXiv Detail & Related papers (2021-05-17T06:03:56Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - Unsupervised Paraphrase Generation using Pre-trained Language Models [0.0]
OpenAI's GPT-2 is notable for its capability to generate fluent, well formulated, grammatically consistent text.
We leverage this generation capability of GPT-2 to generate paraphrases without any supervision from labelled data.
Our experiments show that paraphrases generated with our model are of good quality, are diverse and improves the downstream task performance when used for data augmentation.
arXiv Detail & Related papers (2020-06-09T19:40:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.