Swords: A Benchmark for Lexical Substitution with Improved Data Coverage
and Quality
- URL: http://arxiv.org/abs/2106.04102v1
- Date: Tue, 8 Jun 2021 04:58:29 GMT
- Title: Swords: A Benchmark for Lexical Substitution with Improved Data Coverage
and Quality
- Authors: Mina Lee, Chris Donahue, Alexander Iyabor, Robin Jia, Percy Liang
- Abstract summary: We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context.
We use a context-free thesaurus to produce candidates and rely on human judgement to determine contextual appropriateness.
Compared to the previous largest benchmark, our Swords benchmark has 4.1x more substitutes per target word for the same level of quality, and its substitutes are 1.5x more appropriate (based on human judgement) for the same number of substitutes.
- Score: 126.55416118361495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We release a new benchmark for lexical substitution, the task of finding
appropriate substitutes for a target word in a context. To assist humans with
writing, lexical substitution systems can suggest words that humans cannot
easily think of. However, existing benchmarks depend on human recall as the
only source of data, and therefore lack coverage of the substitutes that would
be most helpful to humans. Furthermore, annotators often provide substitutes of
low quality, which are not actually appropriate in the given context. We
collect higher-coverage and higher-quality data by framing lexical substitution
as a classification problem, guided by the intuition that it is easier for
humans to judge the appropriateness of candidate substitutes than conjure them
from memory. To this end, we use a context-free thesaurus to produce candidates
and rely on human judgement to determine contextual appropriateness. Compared
to the previous largest benchmark, our Swords benchmark has 4.1x more
substitutes per target word for the same level of quality, and its substitutes
are 1.5x more appropriate (based on human judgement) for the same number of
substitutes.
Related papers
- ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution [16.204890291443252]
We propose a new task, language proficiency-oriented lexical substitution.
We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate appropriate substitutes.
We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score.
arXiv Detail & Related papers (2024-01-21T00:58:31Z) - Quality and Quantity of Machine Translation References for Automatic Metrics [4.824118883700288]
Higher-quality references lead to better metric correlations with humans at the segment-level.
The references from vendors of different qualities can be mixed together and improve metric success.
These findings can be used by evaluators of shared tasks when references need to be created under a certain budget.
arXiv Detail & Related papers (2024-01-02T16:51:17Z) - SpellMapper: A non-autoregressive neural spellchecker for ASR
customization with candidate retrieval based on n-gram mappings [76.87664008338317]
Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition.
We propose a novel algorithm for candidate retrieval based on misspelled n-gram mappings.
Experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.
arXiv Detail & Related papers (2023-06-04T10:00:12Z) - Lex2Sent: A bagging approach to unsupervised sentiment analysis [0.628122931748758]
In this paper, we propose an alternative approach to classifying texts: Lex2Sent.
To classify texts, we train embedding models to determine the distances between document embeddings and the embeddings of a suitable lexicon.
We show that our model outperforms lexica and provides a basis for a high performing few-shot fine-tuning approach in the task of binary sentiment analysis.
arXiv Detail & Related papers (2022-09-26T20:49:18Z) - Unsupervised Lexical Substitution with Decontextualised Embeddings [48.00929769805882]
We propose a new unsupervised method for lexical substitution using pre-trained language models.
Our method retrieves substitutes based on the similarity of contextualised and decontextualised word embeddings.
We conduct experiments in English and Italian, and show that our method substantially outperforms strong baselines.
arXiv Detail & Related papers (2022-09-17T03:51:47Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - Human-Paraphrased References Improve Neural Machine Translation [33.86920777067357]
We show that tuning to paraphrased references produces a system that is significantly better according to human judgment.
Our work confirms the finding that paraphrased references yield metric scores that correlate better with human judgment.
arXiv Detail & Related papers (2020-10-20T13:14:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.