Small but Mighty: New Benchmarks for Split and Rephrase
- URL: http://arxiv.org/abs/2009.08560v2
- Date: Sat, 12 Dec 2020 15:35:32 GMT
- Title: Small but Mighty: New Benchmarks for Split and Rephrase
- Authors: Li Zhang, Huaiyu Zhu, Siddhartha Brahma, Yunyao Li
- Abstract summary: Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones.
We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues.
We show that even a simple rule-based model can perform on par with the state-of-the-art model.
- Score: 18.959219419951083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Split and Rephrase is a text simplification task of rewriting a complex
sentence into simpler ones. As a relatively new task, it is paramount to ensure
the soundness of its evaluation benchmark and metric. We find that the widely
used benchmark dataset universally contains easily exploitable syntactic cues
caused by its automatic generation process. Taking advantage of such cues, we
show that even a simple rule-based model can perform on par with the
state-of-the-art model. To remedy such limitations, we collect and release two
crowdsourced benchmark datasets. We not only make sure that they contain
significantly more diverse syntax, but also carefully control for their quality
according to a well-defined set of criteria. While no satisfactory automatic
metric exists, we apply fine-grained manual evaluation based on these criteria
using crowdsourcing, showing that our datasets better represent the task and
are significantly more challenging for the models.
Related papers
- Analysing Zero-Shot Readability-Controlled Sentence Simplification [54.09069745799918]
We investigate how different types of contextual information affect a model's ability to generate sentences with the desired readability.
Results show that all tested models struggle to simplify sentences due to models' limitations and characteristics of the source sentences.
Our experiments also highlight the need for better automatic evaluation metrics tailored to RCTS.
arXiv Detail & Related papers (2024-09-30T12:36:25Z) - Evaluating Document Simplification: On the Importance of Separately Assessing Simplicity and Meaning Preservation [9.618393813409266]
This paper focuses on the evaluation of document-level text simplification.
We compare existing models using distinct metrics for meaning preservation and simplification.
We introduce a reference-less metric variant for simplicity, showing that models are mostly biased towards either simplification or meaning preservation.
arXiv Detail & Related papers (2024-04-04T08:04:24Z) - Improving Text Embeddings with Large Language Models [59.930513259982725]
We introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages.
Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data.
arXiv Detail & Related papers (2023-12-31T02:13:18Z) - WeCheck: Strong Factual Consistency Checker via Weakly Supervised
Learning [40.5830891229718]
We propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck.
Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.
arXiv Detail & Related papers (2022-12-20T08:04:36Z) - On the Limitations of Reference-Free Evaluations of Generated Text [64.81682222169113]
We show that reference-free metrics are inherently biased and limited in their ability to evaluate generated text.
We argue that they should not be used to measure progress on tasks like machine translation or summarization.
arXiv Detail & Related papers (2022-10-22T22:12:06Z) - Finding Dataset Shortcuts with Grammar Induction [85.47127659108637]
We propose to use probabilistic grammars to characterize and discover shortcuts in NLP datasets.
Specifically, we use a context-free grammar to model patterns in sentence classification datasets and use a synchronous context-free grammar to model datasets involving sentence pairs.
The resulting grammars reveal interesting shortcut features in a number of datasets, including both simple and high-level features.
arXiv Detail & Related papers (2022-10-20T19:54:11Z) - SMART: Sentences as Basic Units for Text Evaluation [48.5999587529085]
In this paper, we introduce a new metric called SMART to mitigate such limitations.
We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences.
Our results show that system-level correlations of our proposed metric with a model-based matching function outperforms all competing metrics.
arXiv Detail & Related papers (2022-08-01T17:58:05Z) - Document-Level Text Simplification: Dataset, Criteria and Baseline [75.58761130635824]
We define and investigate a new task of document-level text simplification.
Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia.
We propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task.
arXiv Detail & Related papers (2021-10-11T08:15:31Z) - Few-Shot Upsampling for Protest Size Detection [0.0]
"Upsampling" coarse document labels to fine-grained labels or spans is a common problem in social science research.
We provide a benchmark dataset and baselines on a socially impactful task.
We find that our rule-based model initially outperforms a zero-shot pre-trained transformer language model.
arXiv Detail & Related papers (2021-05-24T13:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.