Leveraging Prompt-Tuning for Bengali Grammatical Error Explanation Using Large Language Models
- URL: http://arxiv.org/abs/2504.05642v1
- Date: Tue, 08 Apr 2025 03:38:01 GMT
- Title: Leveraging Prompt-Tuning for Bengali Grammatical Error Explanation Using Large Language Models
- Authors: Subhankar Maity, Aniket Deroy,
- Abstract summary: We propose a novel three-step prompt-tuning method for Bengali Grammatical Error Explanation (BGEE)<n>Our approach involves identifying and categorizing grammatical errors in Bengali sentences, generating corrected versions of the sentences, and providing natural language explanations for each identified error.<n>We evaluate the performance of our BGEE system using both automated evaluation metrics and human evaluation conducted by experienced Bengali language experts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel three-step prompt-tuning method for Bengali Grammatical Error Explanation (BGEE) using state-of-the-art large language models (LLMs) such as GPT-4, GPT-3.5 Turbo, and Llama-2-70b. Our approach involves identifying and categorizing grammatical errors in Bengali sentences, generating corrected versions of the sentences, and providing natural language explanations for each identified error. We evaluate the performance of our BGEE system using both automated evaluation metrics and human evaluation conducted by experienced Bengali language experts. Our proposed prompt-tuning approach shows that GPT-4, the best performing LLM, surpasses the baseline model in automated evaluation metrics, with a 5.26% improvement in F1 score and a 6.95% improvement in exact match. Furthermore, compared to the previous baseline, GPT-4 demonstrates a decrease of 25.51% in wrong error type and a decrease of 26.27% in wrong error explanation. However, the results still lag behind the human baseline.
Related papers
- Grammatical Error Correction for Low-Resource Languages: The Case of Zarma [8.40484790921164]
Grammatical error correction aims to improve quality and readability of texts.<n>We present a study on GEC for Zarma, spoken by over five million in West Africa.<n>We compare three approaches: rule-based methods, machine translation (MT) models, and large language models.
arXiv Detail & Related papers (2024-10-20T23:51:36Z) - Improving Autoformalization using Type Checking [15.58948808529849]
We analyze both current autoformalization methods and the processes used to evaluate them, focusing specifically on the Lean 4 theorem proving language.<n>We demonstrate that scaling type-check filtering with self-consistency techniques on top of existing methods significantly improves performance, achieving absolute accuracy gains of up to +18.4% on ProofNet.<n>We also release new benchmarks: a new research-level mathematics dataset RLM25, a corrected ProofNet, and ProofNetVerif with labeled correct and incorrect autoformalization pairs for evaluating metrics.
arXiv Detail & Related papers (2024-06-11T13:01:50Z) - How Ready Are Generative Pre-trained Large Language Models for Explaining Bengali Grammatical Errors? [0.4857223913212445]
Grammatical error correction (GEC) tools, powered by advanced generative artificial intelligence (AI), competently correct linguistic inaccuracies in user input.
However, they often fall short in providing essential natural language explanations.
In such languages, grammatical error explanation (GEE) systems should not only correct sentences but also provide explanations for errors.
arXiv Detail & Related papers (2024-05-27T15:56:45Z) - Rapidly Developing High-quality Instruction Data and Evaluation
Benchmark for Large Language Models with Minimal Human Effort: A Case Study
on Japanese [36.3163608701382]
We propose an efficient self-instruct method based on GPT-4.
We first translate a small amount of English instructions into Japanese and post-edit them to obtain native-level quality.
GPT-4 then utilizes them as demonstrations to automatically generate Japanese instruction data.
arXiv Detail & Related papers (2024-03-06T13:17:07Z) - A Novel Approach for Automatic Program Repair using Round-Trip
Translation with Large Language Models [50.86686630756207]
Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back.
Current generative models for Automatic Program Repair (APR) are pre-trained on source code and fine-tuned for repair.
This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back.
arXiv Detail & Related papers (2024-01-15T22:36:31Z) - GEE! Grammar Error Explanation with Large Language Models [64.16199533560017]
We propose the task of grammar error explanation, where a system needs to provide one-sentence explanations for each grammatical error in a pair of erroneous and corrected sentences.
We analyze the capability of GPT-4 in grammar error explanation, and find that it only produces explanations for 60.2% of the errors using one-shot prompting.
We develop a two-step pipeline that leverages fine-tuned and prompted large language models to perform structured atomic token edit extraction.
arXiv Detail & Related papers (2023-11-16T02:45:47Z) - GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical
Error Correction with Supervised Fine-Tuning [46.75740002185691]
We introduce GrammarGPT, an open-source Large Language Model, to explore its potential for native Chinese grammatical error correction.
For grammatical errors with clues, we proposed a method to guide ChatGPT to generate ungrammatical sentences by providing those clues.
For grammatical errors without clues, we collected ungrammatical sentences from publicly available websites and manually corrected them.
arXiv Detail & Related papers (2023-07-26T02:45:38Z) - Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error
Correction [28.58384091374763]
GPT-3 and GPT-4 models are powerful, achieving high performance on a variety of Natural Language Processing tasks.
We perform experiments testing the capabilities of a GPT-3.5 model (text-davinci-003) and a GPT-4 model (gpt-4-0314) on major GEC benchmarks.
We report the performance of our best prompt on the BEA-2019 and JFLEG datasets, finding that the GPT models can perform well in a sentence-level revision setting.
arXiv Detail & Related papers (2023-03-25T03:08:49Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - Understanding by Understanding Not: Modeling Negation in Language Models [81.21351681735973]
Negation is a core construction in natural language.
We propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences.
We reduce the mean top1 error rate to 4% on the negated LAMA dataset.
arXiv Detail & Related papers (2021-05-07T21:58:35Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z) - TuringAdvice: A Generative and Dynamic Evaluation of Language Use [90.3029315711237]
We propose TuringAdvice, a new challenge task and dataset for language understanding models.
Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language.
Empirical results show that today's models struggle at TuringAdvice.
arXiv Detail & Related papers (2020-04-07T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.