Related papers: Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels

Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels

URL: http://arxiv.org/abs/2511.13152v1
Date: Mon, 17 Nov 2025 09:00:26 GMT
Title: Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels
Authors: Sourya Dipta Das, Shubham Kumar, Kuldeep Yadav,
Abstract summary: We propose a zero-shot grammar competency estimation framework that leverages unlabeled data and Large Language Models (LLMs) without relying on manual labels.<n>During training, we employ LLM-generated predictions on unlabeled data by using grammar competency-based prompts.<n>We show that the choice of LLM for pseudo-label generation critically affects model performance and that the ratio of clean-to-noisy samples during training strongly influences stability and accuracy.
Score: 6.254549196597175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Grammar competency estimation is essential for assessing linguistic proficiency in both written and spoken language; however, the spoken modality presents additional challenges due to its spontaneous, unstructured, and disfluent nature. Developing accurate grammar scoring models further requires extensive expert annotation, making large-scale data creation impractical. To address these limitations, we propose a zero-shot grammar competency estimation framework that leverages unlabeled data and Large Language Models (LLMs) without relying on manual labels. During training, we employ LLM-generated predictions on unlabeled data by using grammar competency rubric-based prompts. These predictions, treated as pseudo labels, are utilized to train a transformer-based model through a novel training framework designed to handle label noise effectively. We show that the choice of LLM for pseudo-label generation critically affects model performance and that the ratio of clean-to-noisy samples during training strongly influences stability and accuracy. Finally, a qualitative analysis of error intensity and score prediction confirms the robustness and interpretability of our approach. Experimental results demonstrate the efficacy of our approach in estimating grammar competency scores with high accuracy, paving the way for scalable, low-resource grammar assessment systems.

Related papers

Low-Resource English-Tigrinya MT: Leveraging Multilingual Models, Custom Tokenizers, and Clean Evaluation Benchmarks [6.177998679139308]
Despite advances in Neural Machine Translation (NMT), low-resource languages like Tigrinya remain underserved.<n>This paper investigates transfer learning techniques using multilingual pretrained models to enhance translation quality for morphologically rich, low-resource languages.
arXiv Detail & Related papers (2025-09-24T15:02:57Z)
Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method [108.56493934296687]
We introduce a divergence-based calibration method, inspired by the divergence-from-randomness concept, to calibrate token probabilities for pretraining data detection.<n>We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
arXiv Detail & Related papers (2024-09-23T07:55:35Z)
Harnessing the Intrinsic Knowledge of Pretrained Language Models for Challenging Text Classification Settings [5.257719744958367]
This thesis explores three challenging settings in text classification by leveraging the intrinsic knowledge of pretrained language models (PLMs) We develop models that utilize features based on contextualized word representations from PLMs, achieving performance that rivals or surpasses human accuracy. Lastly, we tackle the sensitivity of large language models to in-context learning prompts by selecting effective demonstrations.
arXiv Detail & Related papers (2024-08-28T09:07:30Z)
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback [71.95402654982095]
We propose Math-Minos, a natural language feedback-enhanced verifier. Our experiments reveal that a small set of natural language feedback can significantly boost the performance of the verifier.
arXiv Detail & Related papers (2024-06-20T06:42:27Z)
Self-training Large Language Models through Knowledge Detection [26.831873737733737]
Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples. Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects.
arXiv Detail & Related papers (2024-06-17T07:25:09Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
Entailment as Robust Self-Learner [14.86757876218415]
We design a prompting strategy that formulates a number of different NLU tasks as contextual entailment. We propose the Simple Pseudo-Label Editing (SimPLE) algorithm for better pseudo-labeling quality in self-training.
arXiv Detail & Related papers (2023-05-26T18:41:23Z)
Uncertainty-aware Self-training for Low-resource Neural Sequence Labeling [29.744621356187764]
This paper presents SeqUST, a novel uncertain-aware self-training framework for Neural sequence labeling (NSL) We incorporate Monte Carlo (MC) dropout in Bayesian neural network (BNN) to perform uncertainty estimation at the token level and then select reliable language tokens from unlabeled data. A well-designed masked sequence labeling task with a noise-robust loss supports robust training, which aims to suppress the problem of noisy pseudo labels.
arXiv Detail & Related papers (2023-02-17T02:40:04Z)
SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation. We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data. The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.