PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models
- URL: http://arxiv.org/abs/2409.12060v2
- Date: Mon, 16 Dec 2024 10:09:23 GMT
- Title: PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models
- Authors: Andrianos Michail, Simon Clematide, Juri Opitz,
- Abstract summary: PARAPHR is a benchmark for multi-dimensional assessment, benchmarking and selection of paraphrase detection models.
We find that paraphrase detection models under our fine-grained evaluation lens exhibit trade-offs that cannot be captured through a single classification dataset.
- Score: 5.980822697955566
- License:
- Abstract: The task of determining whether two texts are paraphrases has long been a challenge in NLP. However, the prevailing notion of paraphrase is often quite simplistic, offering only a limited view of the vast spectrum of paraphrase phenomena. Indeed, we find that evaluating models in a paraphrase dataset can leave uncertainty about their true semantic understanding. To alleviate this, we create PARAPHRASUS, a benchmark designed for multi-dimensional assessment, benchmarking and selection of paraphrase detection models. We find that paraphrase detection models under our fine-grained evaluation lens exhibit trade-offs that cannot be captured through a single classification dataset. Furthermore, PARAPHRASUS allows prompt calibration for different use cases, tailoring LLM models to specific strictness levels. PARAPHRASUS includes 3 challenges spanning over 10 datasets, including 8 repurposed and 2 newly annotated; we release it along with a benchmarking library at https://github.com/impresso/paraphrasus
Related papers
- Beyond Coarse-Grained Matching in Video-Text Retrieval [50.799697216533914]
We introduce a new approach for fine-grained evaluation.
Our approach can be applied to existing datasets by automatically generating hard negative test captions.
Experiments on our fine-grained evaluations demonstrate that this approach enhances a model's ability to understand fine-grained differences.
arXiv Detail & Related papers (2024-10-16T09:42:29Z) - Cross-lingual paraphrase identification [0.0]
We train a bi-encoder model in a contrastive manner to detect hard paraphrases across multiple languages.
Our performance is comparable to state-of-the-art cross-encoders.
arXiv Detail & Related papers (2024-06-21T11:37:24Z) - Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
We train a dual-encoder model starting from a language model pretrained on a large text corpus.
Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z) - Scalable Performance Analysis for Vision-Language Models [26.45624201546282]
Joint vision-language models have shown great performance over a diverse set of tasks.
Our paper introduces a more scalable solution that relies on already annotated benchmarks.
We confirm previous findings that CLIP behaves like a bag of words model and performs better with nouns and verbs.
arXiv Detail & Related papers (2023-05-30T06:40:08Z) - SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
Tasks [88.4408774253634]
Spoken language understanding (SLU) tasks have been studied for many decades in the speech research community.
There are not nearly as many SLU task benchmarks, and many of the existing ones use data that is not freely available to all researchers.
Recent work has begun to introduce such benchmark for several tasks.
arXiv Detail & Related papers (2022-12-20T18:39:59Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Corpus-Based Paraphrase Detection Experiments and Review [0.0]
Paraphrase detection is important for a number of applications, including plagiarism detection, authorship attribution, question answering, text summarization, etc.
In this paper, we give a performance overview of various types of corpus-based models, especially deep learning (DL) models, with the task of paraphrase detection.
arXiv Detail & Related papers (2021-05-31T23:29:24Z) - A Token-level Reference-free Hallucination Detection Benchmark for
Free-form Text Generation [50.55448707570669]
We propose a novel token-level, reference-free hallucination detection task and an associated annotated dataset named HaDes.
To create this dataset, we first perturb a large number of text segments extracted from English language Wikipedia, and then verify these with crowd-sourced annotations.
arXiv Detail & Related papers (2021-04-18T04:09:48Z) - Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext)
Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z) - A Multi-cascaded Model with Data Augmentation for Enhanced Paraphrase
Detection in Short Texts [1.6758573326215689]
We present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts.
Our model is both wide and deep and provides greater robustness across clean and noisy short texts.
arXiv Detail & Related papers (2019-12-27T12:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.