A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase
Detection in Short Texts
- URL: http://arxiv.org/abs/1912.12068v1
- Date: Fri, 27 Dec 2019 12:10:10 GMT
- Title: A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase
Detection in Short Texts
- Authors: Muhammad Haroon Shakeel, Asim Karim, Imdadullah Khan
- Abstract summary: We present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts.
Our model is both wide and deep and provides greater robustness across clean and noisy short texts.
- Score: 1.6758573326215689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Paraphrase detection is an important task in text analytics with numerous
applications such as plagiarism detection, duplicate question identification,
and enhanced customer support helpdesks. Deep models have been proposed for
representing and classifying paraphrases. These models, however, require large
quantities of human-labeled data, which is expensive to obtain. In this work,
we present a data augmentation strategy and a multi-cascaded model for improved
paraphrase detection in short texts. Our data augmentation strategy considers
the notions of paraphrases and non-paraphrases as binary relations over the set
of texts. Subsequently, it uses graph theoretic concepts to efficiently
generate additional paraphrase and non-paraphrase pairs in a sound manner. Our
multi-cascaded model employs three supervised feature learners (cascades) based
on CNN and LSTM networks with and without soft-attention. The learned features,
together with hand-crafted linguistic features, are then forwarded to a
discriminator network for final classification. Our model is both wide and deep
and provides greater robustness across clean and noisy short texts. We evaluate
our approach on three benchmark datasets and show that it produces a comparable
or state-of-the-art performance on all three.
Related papers
- Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
We train a dual-encoder model starting from a language model pretrained on a large text corpus.
Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z) - DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels.
DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy.
DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - ParaAMR: A Large-Scale Syntactically Diverse Paraphrase Dataset by AMR
Back-Translation [59.91139600152296]
ParaAMR is a large-scale syntactically diverse paraphrase dataset created by abstract meaning representation back-translation.
We show that ParaAMR can be used to improve on three NLP tasks: learning sentence embeddings, syntactically controlled paraphrase generation, and data augmentation for few-shot learning.
arXiv Detail & Related papers (2023-05-26T02:27:33Z) - Analyzing Vietnamese Legal Questions Using Deep Neural Networks with
Biaffine Classifiers [3.116035935327534]
We propose using deep neural networks to extract important information from Vietnamese legal questions.
Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question.
arXiv Detail & Related papers (2023-04-27T18:19:24Z) - A Template-guided Hybrid Pointer Network for
Knowledge-basedTask-oriented Dialogue Systems [15.654119998970499]
We propose a template-guided hybrid pointer network for the knowledge-based task-oriented dialogue system.
We design a memory pointer network model with a gating mechanism to fully exploit the semantic correlation between the retrieved answers and the ground-truth response.
arXiv Detail & Related papers (2021-06-10T15:49:26Z) - Corpus-Based Paraphrase Detection Experiments and Review [0.0]
Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, etc.
In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection.
arXiv Detail & Related papers (2021-05-31T23:29:24Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.