Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs
- URL: http://arxiv.org/abs/2506.02758v1
- Date: Tue, 03 Jun 2025 11:23:57 GMT
- Title: Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs
- Authors: Stefano BannĂ², Kate Knill, Mark Gales,
- Abstract summary: This paper introduces a novel approach to enable fine-grained vocabulary evaluation.<n>The scheme combines large language models (LLMs) with the English Vocabulary Profile (EVP)<n>The EVP is a standard lexical resource that enables in-context vocabulary use to be linked with proficiency level.
- Score: 2.201161230389126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vocabulary use is a fundamental aspect of second language (L2) proficiency. To date, its assessment by automated systems has typically examined the context-independent, or part-of-speech (PoS) related use of words. This paper introduces a novel approach to enable fine-grained vocabulary evaluation exploiting the precise use of words within a sentence. The scheme combines large language models (LLMs) with the English Vocabulary Profile (EVP). The EVP is a standard lexical resource that enables in-context vocabulary use to be linked with proficiency level. We evaluate the ability of LLMs to assign proficiency levels to individual words as they appear in L2 learner writing, addressing key challenges such as polysemy, contextual variation, and multi-word expressions. We compare LLMs to a PoS-based baseline. LLMs appear to exploit additional semantic information that yields improved performance. We also explore correlations between word-level proficiency and essay-level proficiency. Finally, the approach is applied to examine the consistency of the EVP proficiency levels. Results show that LLMs are well-suited for the task of vocabulary assessment.
Related papers
- Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings [26.442558912559658]
Large Language Models (LLMs) recently achieved great success in medical text summarization by simply using in-context learning.<n>We show that LLMs show a significant performance drop for data points with high concentration of out-of-vocabulary words or with high novelty.<n> Vocabulary adaptation is an intuitive solution to this vocabulary mismatch issue.
arXiv Detail & Related papers (2025-05-27T14:23:03Z) - Prompt and circumstance: A word-by-word LLM prompting approach to interlinear glossing for low-resource languages [6.4977738682502295]
We investigate the effectiveness of a retrieval-based LLM prompting approach to glossing, applied to the seven languages from the SIGMORPHON 2023 shared task.<n>Our system beats the BERT-based shared task baseline for every language in the morpheme-level score category.<n>In a case study on Tsez, we ask the LLM to automatically create and follow linguistic instructions, reducing errors on a confusing grammatical feature.
arXiv Detail & Related papers (2025-02-13T21:23:16Z) - Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts [19.02690795530784]
We propose a Contextualized Spoken-to-Written conversion (CoS2W) task to address ASR and grammar errors.<n>This task naturally matches the in-context learning capabilities of Large Language Models (LLMs)
arXiv Detail & Related papers (2024-08-19T03:53:48Z) - PhonologyBench: Evaluating Phonological Skills of Large Language Models [57.80997670335227]
Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research.
We present PhonologyBench, a novel benchmark consisting of three diagnostic tasks designed to explicitly test the phonological skills of LLMs.
We observe a significant gap of 17% and 45% on Rhyme Word Generation and Syllable counting, respectively, when compared to humans.
arXiv Detail & Related papers (2024-04-03T04:53:14Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - Self-Augmented In-Context Learning for Unsupervised Word Translation [23.495503962839337]
Large language models (LLMs) demonstrate strong word translation or bilingual lexicon induction (BLI) capabilities in few-shot setups.
We propose self-augmented in-context learning (SAIL) for unsupervised BLI.
Our method shows substantial gains over zero-shot prompting of LLMs on two established BLI benchmarks.
arXiv Detail & Related papers (2024-02-15T15:43:05Z) - Think from Words(TFW): Initiating Human-Like Cognition in Large Language
Models Through Think from Words for Japanese Text-level Classification [0.0]
"Think from Words" (TFW) initiates the comprehension process at the word level and then extends it to encompass the entire text.
"TFW with Extra word-level information" (TFW Extra) augmenting comprehension with additional word-level data.
Our findings shed light on the impact of various word-level information types on LLMs' text comprehension.
arXiv Detail & Related papers (2023-12-06T12:34:46Z) - Leveraging Word Guessing Games to Assess the Intelligence of Large
Language Models [105.39236338147715]
The paper is inspired by the popular language game Who is Spy''
We develop DEEP to evaluate LLMs' expression and disguising abilities.
We then introduce SpyGame, an interactive multi-agent framework.
arXiv Detail & Related papers (2023-10-31T14:37:42Z) - Establishing Vocabulary Tests as a Benchmark for Evaluating Large
Language Models [2.7013338932521416]
We advocate for the revival of vocabulary tests as a valuable tool for assessing Large Language Models (LLMs) performance.
We evaluate seven LLMs using two vocabulary test formats across two languages and uncover surprising gaps in their lexical knowledge.
arXiv Detail & Related papers (2023-10-23T08:45:12Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.