Investigating Text Simplification Evaluation
- URL: http://arxiv.org/abs/2107.13662v1
- Date: Wed, 28 Jul 2021 22:49:32 GMT
- Title: Investigating Text Simplification Evaluation
- Authors: Laura V\'asquez-Rodr\'iguez, Matthew Shardlow, Piotr Przyby{\l}a,
Sophia Ananiadou
- Abstract summary: Modern text simplification (TS) heavily relies on the availability of gold standard data to build machine learning models.
Existing studies show that parallel TS corpora contain inaccurate simplifications and incorrect alignments.
evaluation is usually performed by using metrics such as BLEU or SARI to compare system output to the gold standard.
- Score: 21.128143745540292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern text simplification (TS) heavily relies on the availability of gold
standard data to build machine learning models. However, existing studies show
that parallel TS corpora contain inaccurate simplifications and incorrect
alignments. Additionally, evaluation is usually performed by using metrics such
as BLEU or SARI to compare system output to the gold standard. A major
limitation is that these metrics do not match human judgements and the
performance on different datasets and linguistic phenomena vary greatly.
Furthermore, our research shows that the test and training subsets of parallel
datasets differ significantly. In this work, we investigate existing TS
corpora, providing new insights that will motivate the improvement of existing
state-of-the-art TS evaluation methods. Our contributions include the analysis
of TS corpora based on existing modifications used for simplification and an
empirical study on TS models performance by using better-distributed datasets.
We demonstrate that by improving the distribution of TS datasets, we can build
more robust TS models.
Related papers
- Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all [1.507700065820919]
Recent advancements in transcriptomics sequencing provide new opportunities to uncover valuable insights.
No benchmark has been made to robustly evaluate the effectiveness of these rising models for perturbation analysis.
This article presents a novel biologically motivated evaluation framework and a hierarchy of perturbation analysis tasks.
arXiv Detail & Related papers (2024-10-17T18:27:51Z) - Can You Rely on Your Model Evaluation? Improving Model Evaluation with
Synthetic Test Data [75.20035991513564]
We introduce 3S Testing, a deep generative modeling framework to facilitate model evaluation.
Our experiments demonstrate that 3S Testing outperforms traditional baselines.
These results raise the question of whether we need a paradigm shift away from limited real test data towards synthetic test data.
arXiv Detail & Related papers (2023-10-25T10:18:44Z) - BLESS: Benchmarking Large Language Models on Sentence Simplification [55.461555829492866]
We present BLESS, a performance benchmark of the most recent state-of-the-art large language models (LLMs) on the task of text simplification (TS)
We assess a total of 44 models, differing in size, architecture, pre-training methods, and accessibility, on three test sets from different domains (Wikipedia, news, and medical) under a few-shot setting.
Our evaluation indicates that the best LLMs, despite not being trained on TS, perform comparably with state-of-the-art TS baselines.
arXiv Detail & Related papers (2023-10-24T12:18:17Z) - Comparative Analysis of Transfer Learning in Deep Learning
Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset [10.119929769316565]
This thesis is rooted in the pressing need to find TTS models that require less training time, fewer data samples, yet yield high-quality voice output.
The research evaluates TTS state-of-the-art model transfer learning capabilities through a thorough technical analysis.
It then conducts a hands-on experimental analysis to compare models' performance in a constrained dataset.
arXiv Detail & Related papers (2023-10-08T03:08:25Z) - PartMix: Regularization Strategy to Learn Part Discovery for
Visible-Infrared Person Re-identification [76.40417061480564]
We present a novel data augmentation technique, dubbed PartMix, for part-based Visible-Infrared person Re-IDentification (VI-ReID) models.
We synthesize the augmented samples by mixing the part descriptors across the modalities to improve the performance of part-based VI-ReID models.
arXiv Detail & Related papers (2023-04-04T05:21:23Z) - Cognitive Simplification Operations Improve Text Simplification [24.970301040693883]
We present a method for incorporating knowledge from the cognitive accessibility domain into a Text Simplification model.
We show that by adding this inductive bias to a TS-trained model, it is able to adapt better to Cognitive Simplification without ever seeing CS data.
arXiv Detail & Related papers (2022-11-16T10:51:03Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - A comprehensive comparative evaluation and analysis of Distributional
Semantic Models [61.41800660636555]
We perform a comprehensive evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT.
The results show that the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous.
We borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models.
arXiv Detail & Related papers (2021-05-20T15:18:06Z) - Meta-learning framework with applications to zero-shot time-series
forecasting [82.61728230984099]
This work provides positive evidence using a broad meta-learning framework.
residual connections act as a meta-learning adaptation mechanism.
We show that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining.
arXiv Detail & Related papers (2020-02-07T16:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.