Related papers: Neural paraphrasing by automatically crawled and aligned sentence pairs

Neural paraphrasing by automatically crawled and aligned sentence pairs

URL: http://arxiv.org/abs/2402.10558v1
Date: Fri, 16 Feb 2024 10:40:38 GMT
Title: Neural paraphrasing by automatically crawled and aligned sentence pairs
Authors: Achille Globo and Antonio Trevisi and Andrea Zugarini and Leonardo Rigutini and Marco Maggini and Stefano Melacci
Abstract summary: The main obstacle toward neural-network-based paraphrasing is the lack of large datasets with aligned pairs of sentences and paraphrases. We present a method for the automatic generation of large aligned corpora, that is based on the assumption that news and blog websites talk about the same events using different narrative styles. We propose a similarity search procedure with linguistic constraints that, given a reference sentence, is able to locate the most similar candidate paraphrases out from millions of indexed sentences.
Score: 11.95795974003684
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Paraphrasing is the task of re-writing an input text using other words, without altering the meaning of the original content. Conversational systems can exploit automatic paraphrasing to make the conversation more natural, e.g., talking about a certain topic using different paraphrases in different time instants. Recently, the task of automatically generating paraphrases has been approached in the context of Natural Language Generation (NLG). While many existing systems simply consist in rule-based models, the recent success of the Deep Neural Networks in several NLG tasks naturally suggests the possibility of exploiting such networks for generating paraphrases. However, the main obstacle toward neural-network-based paraphrasing is the lack of large datasets with aligned pairs of sentences and paraphrases, that are needed to efficiently train the neural models. In this paper we present a method for the automatic generation of large aligned corpora, that is based on the assumption that news and blog websites talk about the same events using different narrative styles. We propose a similarity search procedure with linguistic constraints that, given a reference sentence, is able to locate the most similar candidate paraphrases out from millions of indexed sentences. The data generation process is evaluated in the case of the Italian language, performing experiments using pointer-based deep neural architectures.

Related papers

Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing. Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy. We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z)
Unsupervised Keyphrase Extraction via Interpretable Neural Networks [27.774524511005172]
Keyphrases that are most useful for predicting the topic of a text are important keyphrases. InSPECT is a self-explaining neural framework for identifying influential keyphrases. We show that INSPECT achieves state-of-the-art results in unsupervised key extraction across four diverse datasets.
arXiv Detail & Related papers (2022-03-15T04:30:47Z)
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP [22.772546707304766]
We show how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications.
arXiv Detail & Related papers (2021-12-20T13:04:18Z)
Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering [88.08581016329398]
We propose CoRPG (Coherence Relationship guided Paraphrase Generation) for document-level paraphrase generation. We use graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence. Our model can generate document paraphrase with more diversity and semantic preservation.
arXiv Detail & Related papers (2021-09-15T05:53:40Z)
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration [25.159601117722936]
We propose a contrastive fine-tuning objective that enables BERT to produce more powerful phrase embeddings. Our approach relies on a dataset of diverse phrasal paraphrases, which is automatically generated using a paraphrase generation model. As a case study, we show that Phrase-BERT embeddings can be easily integrated with a simple autoencoder to build a phrase-based neural topic model.
arXiv Detail & Related papers (2021-09-13T20:31:57Z)
LadRa-Net: Locally-Aware Dynamic Re-read Attention Net for Sentence Semantic Matching [66.65398852962177]
We develop a novel Dynamic Re-read Network (DRr-Net) for sentence semantic matching. We extend DRr-Net to Locally-Aware Dynamic Re-read Attention Net (LadRa-Net) Experiments on two popular sentence semantic matching tasks demonstrate that DRr-Net can significantly improve the performance of sentence semantic matching.
arXiv Detail & Related papers (2021-08-06T02:07:04Z)
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger. We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences. We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z)
Contextual Biasing of Language Models for Speech Recognition in Goal-Oriented Conversational Agents [11.193867567895353]
Goal-oriented conversational interfaces are designed to accomplish specific tasks. We propose a new architecture that utilizes context embeddings derived from BERT on sample utterances provided during inference time. Our experiments show a word error rate (WER) relative reduction of 7% over non-contextual utterance-level NLM rescorers on goal-oriented audio datasets.
arXiv Detail & Related papers (2021-03-18T15:38:08Z)
Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention [75.44523978180317]
We propose emphSEG-Net, a neural keyphrase generation model that is composed of two major components. The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
arXiv Detail & Related papers (2020-08-04T18:00:07Z)
Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model. First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model. Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z)
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation [92.7366819044397]
Self-supervised pre-training has emerged as a powerful technique for natural language understanding and generation. This work presents PALM with a novel scheme that jointly pre-trains an autoencoding and autoregressive language model on a large unlabeled corpus. An extensive set of experiments show that PALM achieves new state-of-the-art results on a variety of language generation benchmarks.
arXiv Detail & Related papers (2020-04-14T06:25:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.