Improving Paraphrase Detection with the Adversarial Paraphrasing Task
- URL: http://arxiv.org/abs/2106.07691v1
- Date: Mon, 14 Jun 2021 18:15:20 GMT
- Title: Improving Paraphrase Detection with the Adversarial Paraphrasing Task
- Authors: Animesh Nighojkar and John Licato
- Abstract summary: Paraphrasing datasets currently rely on a sense of paraphrase based on word overlap and syntax.
We introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT)
APT asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: If two sentences have the same meaning, it should follow that they are
equivalent in their inferential properties, i.e., each sentence should
textually entail the other. However, many paraphrase datasets currently in
widespread use rely on a sense of paraphrase based on word overlap and syntax.
Can we teach them instead to identify paraphrases in a way that draws on the
inferential properties of the sentences, and is not over-reliant on lexical and
syntactic similarities of a sentence pair? We apply the adversarial paradigm to
this question, and introduce a new adversarial method of dataset creation for
paraphrase identification: the Adversarial Paraphrasing Task (APT), which asks
participants to generate semantically equivalent (in the sense of mutually
implicative) but lexically and syntactically disparate paraphrases. These
sentence pairs can then be used both to test paraphrase identification models
(which get barely random accuracy) and then improve their performance. To
accelerate dataset generation, we explore automation of APT using T5, and show
that the resulting dataset also improves accuracy. We discuss implications for
paraphrase detection and release our dataset in the hope of making paraphrase
detection models better able to detect sentence-level meaning equivalence.
Related papers
- Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text [61.22649031769564]
We propose a novel framework, paraphrased text span detection (PTD)
PTD aims to identify paraphrased text spans within a text.
We construct a dedicated dataset, PASTED, for paraphrased text span detection.
arXiv Detail & Related papers (2024-05-21T11:22:27Z) - Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining [0.22499166814992438]
We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector is not sufficient for effective phrase retrieval.
We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations.
arXiv Detail & Related papers (2024-05-12T12:08:05Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR
Back-Translation [59.91139600152296]
ParaAMR is a large-scale syntactically diverse paraphrase dataset created by abstract meaning representation back-translation.
We show that ParaAMR can be used to improve on three NLP tasks: learning sentence embeddings, syntactically controlled paraphrase generation, and data augmentation for few-shot learning.
arXiv Detail & Related papers (2023-05-26T02:27:33Z) - Using Paraphrases to Study Properties of Contextual Embeddings [46.84861591608146]
We use paraphrases as a unique source of data to analyze contextualized embeddings.
Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings.
We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases.
arXiv Detail & Related papers (2022-07-12T14:22:05Z) - Semantic Search as Extractive Paraphrase Span Detection [0.8137055256093007]
We frame the problem of semantic search by framing the search task as paraphrase span detection.
On the Turku Paraphrase Corpus of 100,000 manually extracted Finnish paraphrase pairs, we find that our paraphrase span detection model outperforms two strong retrieval baselines.
We introduce a method for creating artificial paraphrase data through back-translation, suitable for languages where manually annotated paraphrase resources are not available.
arXiv Detail & Related papers (2021-12-09T13:16:42Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Delexicalized Paraphrase Generation [7.504832901086077]
We present a neural model for paraphrasing and train it to generate delexicalized sentences.
We achieve this by creating training data in which each input is paired with a number of reference paraphrases.
We show empirically that the generated paraphrases are of high quality, leading to an additional 1.29% exact match on live utterances.
arXiv Detail & Related papers (2020-12-04T18:28:30Z) - Revisiting Paraphrase Question Generator using Pairwise Discriminator [25.449902612898594]
We propose a novel method for obtaining sentence-level embeddings.
The proposed method results in semantic embeddings and outperforms the state-of-the-art on the paraphrase generation and sentiment analysis task.
arXiv Detail & Related papers (2019-12-31T02:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.