Delexicalized Paraphrase Generation
- URL: http://arxiv.org/abs/2012.02763v1
- Date: Fri, 4 Dec 2020 18:28:30 GMT
- Title: Delexicalized Paraphrase Generation
- Authors: Boya Yu, Konstantine Arkoudas, Wael Hamza
- Abstract summary: We present a neural model for paraphrasing and train it to generate delexicalized sentences.
We achieve this by creating training data in which each input is paired with a number of reference paraphrases.
We show empirically that the generated paraphrases are of high quality, leading to an additional 1.29% exact match on live utterances.
- Score: 7.504832901086077
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a neural model for paraphrasing and train it to generate
delexicalized sentences. We achieve this by creating training data in which
each input is paired with a number of reference paraphrases. These sets of
reference paraphrases represent a weak type of semantic equivalence based on
annotated slots and intents. To understand semantics from different types of
slots, other than anonymizing slots, we apply convolutional neural networks
(CNN) prior to pooling on slot values and use pointers to locate slots in the
output. We show empirically that the generated paraphrases are of high quality,
leading to an additional 1.29% exact match on live utterances. We also show
that natural language understanding (NLU) tasks, such as intent classification
and named entity recognition, can benefit from data augmentation using
automatically generated paraphrases.
Related papers
- Neural paraphrasing by automatically crawled and aligned sentence pairs [11.95795974003684]
The main obstacle toward neural-network-based paraphrasing is the lack of large datasets with aligned pairs of sentences and paraphrases.
We present a method for the automatic generation of large aligned corpora, that is based on the assumption that news and blog websites talk about the same events using different narrative styles.
We propose a similarity search procedure with linguistic constraints that, given a reference sentence, is able to locate the most similar candidate paraphrases out from millions of indexed sentences.
arXiv Detail & Related papers (2024-02-16T10:40:38Z) - Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose an self-supervised continual learning approach to recognize new words.
We use a memory-enhanced Automatic Speech Recognition model from previous work.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z) - ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR
Back-Translation [59.91139600152296]
ParaAMR is a large-scale syntactically diverse paraphrase dataset created by abstract meaning representation back-translation.
We show that ParaAMR can be used to improve on three NLP tasks: learning sentence embeddings, syntactically controlled paraphrase generation, and data augmentation for few-shot learning.
arXiv Detail & Related papers (2023-05-26T02:27:33Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Learning Rich Representation of Keyphrases from Text [12.698835743464313]
We show how to learn task-specific language models aimed towards learning rich representation of keyphrases from text documents.
In the discriminative setting, we introduce a new pre-training objective - Keyphrase Boundary Infilling with Replacement (KBIR)
In the generative setting, we introduce a new pre-training setup for BART - KeyBART, that reproduces the keyphrases related to the input text in the CatSeq format.
arXiv Detail & Related papers (2021-12-16T01:09:51Z) - Improving Paraphrase Detection with the Adversarial Paraphrasing Task [0.0]
Paraphrasing datasets currently rely on a sense of paraphrase based on word overlap and syntax.
We introduce a new adversarial method of dataset creation for paraphrase identification: the Adversarial Paraphrasing Task (APT)
APT asks participants to generate semantically equivalent (in the sense of mutually implicative) but lexically and syntactically disparate paraphrases.
arXiv Detail & Related papers (2021-06-14T18:15:20Z) - Factorising Meaning and Form for Intent-Preserving Paraphrasing [59.13322531639124]
We propose a method for generating paraphrases of English questions that retain the original intent but use a different surface form.
Our model combines a careful choice of training objective with a principled information bottleneck.
We are able to generate paraphrases with a better tradeoff between semantic preservation and syntactic novelty compared to previous methods.
arXiv Detail & Related papers (2021-05-31T15:37:38Z) - Linguistically-Enriched and Context-Aware Zero-shot Slot Filling [6.06746295810681]
Slot filling is one of the most important challenges in modern task-oriented dialog systems.
New domains (i.e., unseen in training) may emerge after deployment.
It is imperative that models seamlessly adapt and fill slots from both seen and unseen domains.
arXiv Detail & Related papers (2021-01-16T20:18:16Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z) - Lexical Sememe Prediction using Dictionary Definitions by Capturing
Local Semantic Correspondence [94.79912471702782]
Sememes, defined as the minimum semantic units of human languages, have been proven useful in many NLP tasks.
We propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes.
We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-01-16T17:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.