ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution
- URL: http://arxiv.org/abs/2401.11356v3
- Date: Sun, 2 Jun 2024 21:05:36 GMT
- Title: ProLex: A Benchmark for Language Proficiency-oriented Lexical Substitution
- Authors: Xuanming Zhang, Zixun Chen, Zhou Yu,
- Abstract summary: We propose a new task, language proficiency-oriented lexical substitution.
We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate appropriate substitutes.
We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score.
- Score: 16.204890291443252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lexical Substitution discovers appropriate substitutes for a given target word in a context sentence. However, the task fails to consider substitutes that are of equal or higher proficiency than the target, an aspect that could be beneficial for language learners looking to improve their writing. To bridge this gap, we propose a new task, language proficiency-oriented lexical substitution. We also introduce ProLex, a novel benchmark designed to assess systems' ability to generate not only appropriate substitutes but also substitutes that demonstrate better language proficiency. Besides the benchmark, we propose models that can automatically perform the new task. We show that our best model, a Llama2-13B model fine-tuned with task-specific synthetic data, outperforms ChatGPT by an average of 3.2% in F-score and achieves comparable results with GPT-4 on ProLex.
Related papers
- Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - Unsupervised Lexical Simplification with Context Augmentation [55.318201742039]
Given a target word and its context, our method generates substitutes based on the target context and additional contexts sampled from monolingual data.
We conduct experiments in English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that our model substantially outperforms other unsupervised systems across all languages.
arXiv Detail & Related papers (2023-11-01T05:48:05Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - ProsAudit, a prosodic benchmark for self-supervised speech models [14.198508548718676]
ProsAudit is a benchmark to assess structural prosodic knowledge in self-supervised learning (SSL) speech models.
It consists of two subtasks, their corresponding metrics, and an evaluation dataset.
arXiv Detail & Related papers (2023-02-23T14:30:23Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Swords: A Benchmark for Lexical Substitution with Improved Data Coverage
and Quality [126.55416118361495]
We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context.
We use a context-free thesaurus to produce candidates and rely on human judgement to determine contextual appropriateness.
Compared to the previous largest benchmark, our Swords benchmark has 4.1x more substitutes per target word for the same level of quality, and its substitutes are 1.5x more appropriate (based on human judgement) for the same number of substitutes.
arXiv Detail & Related papers (2021-06-08T04:58:29Z) - Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings [41.148892848434585]
We propose a novel framework to align contextual embeddings at the sense level by leveraging cross-lingual signal from bilingual dictionaries only.
We operationalize our framework by first proposing a novel sense-aware cross entropy loss to model word senses explicitly.
We then propose a sense alignment objective on top of the sense-aware cross entropy loss for cross-lingual model pretraining, and pretrain cross-lingual models for several language pairs.
arXiv Detail & Related papers (2021-03-11T04:55:35Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.