Related papers: Explain in Plain Language Questions with Indic Languages: Drawbacks, Affordances, and Opportunities

Explain in Plain Language Questions with Indic Languages: Drawbacks, Affordances, and Opportunities

URL: http://arxiv.org/abs/2409.20297v1
Date: Mon, 30 Sep 2024 13:56:29 GMT
Title: Explain in Plain Language Questions with Indic Languages: Drawbacks, Affordances, and Opportunities
Authors: David H. Smith IV, Viraj Kumar, Paul Denny,
Abstract summary: We evaluate the efficacy of a recently introduced approach called Code Generation Based Grading (CGBG) in enabling language agnostic Explain in Plain Language'' activities.
Score: 1.9121661610146587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Background: Introductory computer science courses use ``Explain in Plain English'' (EiPE) activities to develop and assess students' code comprehension skills, but creating effective autograders for these questions is challenging and limited to English. This is a particular challenge in linguistically diverse countries like India where students may have limited proficiency in English. Methods: We evaluate the efficacy of a recently introduced approach called Code Generation Based Grading (CGBG) in enabling language agnostic ``Explain in Plain Language'' (EiPL) activities. Here students' EiPL responses generate code that is tested for functional equivalence to the original which was being described. Objectives: We initially evaluate the correctness of code generated from correct EiPL responses provided in 10 of India's most commonly spoken languages. To evaluate the effectiveness of the approach in practice, we assess student success and perceptions of EiPL questions in a NPTEL (National Programme on Technology Enhanced Learning) course. Results: We find promising results for the correctness of code generated from translations of correct EiPL responses, with most languages achieving a correctness rate of 75% or higher. However, in practice, many students preferred to respond in English due to greater familiarity with English as a technical language, difficulties writing in their native language, and perceptions of the grader being less capable of generating code from prompts in their mother tongue.

Related papers

HoarePrompt: Structural Reasoning About Program Correctness in Natural Language [6.0749049701897295]
HoarePrompt is a novel approach that adapts fundamental ideas from program analysis and verification to natural language artifacts. To manage loops, we propose few-shot-driven k-induction, an adaptation of the k-induction method widely used in model checking. Our experiments show that HoarePrompt improves the MCC by 62% compared to directly using Zero-shot-CoT prompts for correctness classification.
arXiv Detail & Related papers (2025-03-25T12:30:30Z)
ReDefining Code Comprehension: Function Naming as a Mechanism for Evaluating Code Comprehension [2.250363093539224]
"Explain in Plain English" (EiPE) questions are widely used to assess code comprehension skills. Recent approaches like Code Generation Based Grading (CGBG) leverage large language models to generate code. We propose a modified approach where students generate function names, emphasizing the function's purpose over implementation details.
arXiv Detail & Related papers (2025-03-15T17:22:14Z)
From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages [0.5706164516481158]
We propose a model-agnostic cost-effective approach to developing bilingual base large language models (LLMs) to support English and any target language. We performed experiments with three languages, each using a non-Latin script - Ukrainian, Arabic, and Georgian.
arXiv Detail & Related papers (2024-10-24T15:20:54Z)
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do [2.2469442203227863]
We conduct a comprehensive analysis of automated evaluators, reporting several key findings on their behavior. We discover that English evaluation capabilities significantly influence language-specific evaluation capabilities, enabling evaluators trained in English to easily transfer their skills to other languages. We find that state-of-the-art evaluators struggle with challenging prompts, in either English or Korean, underscoring their limitations in assessing or generating complex reasoning questions.
arXiv Detail & Related papers (2024-09-17T14:40:02Z)
From Effectiveness to Efficiency: Uncovering Linguistic Bias in Large Language Model-based Code Generation [30.914387085368734]
Large Language Models (LLMs) have demonstrated promising capabilities for code generation. In this paper, we aim to investigate the potential linguistic bias through the lens of English and Chinese.
arXiv Detail & Related papers (2024-06-02T03:22:30Z)
Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills [4.776920192249936]
We propose using an LLM to generate code based on students' responses to EiPE questions. We report student success in creating effective prompts for solving EiPE questions.
arXiv Detail & Related papers (2024-03-10T00:23:08Z)
Code Generation Based Grading: Evaluating an Auto-grading Mechanism for "Explain-in-Plain-English" Questions [0.0]
"Code Generation Based Grading" (CGBG) achieves moderate agreement with human graders. CGBG achieves moderate agreement with human graders with respect to low-level and line-by-line descriptions of code.
arXiv Detail & Related papers (2023-11-25T02:45:00Z)
UKP-SQuARE: An Interactive Tool for Teaching Question Answering [61.93372227117229]
The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course. We introduce UKP-SQuARE as a platform for QA education. Students can run, compare, and analyze various QA models from different perspectives.
arXiv Detail & Related papers (2023-05-31T11:29:04Z)
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources. Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z)
LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results. Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results. LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English. We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian. While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z)
The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding [15.54831836850549]
We propose the use of bilingual intermediate pretraining as a reliable technique to derive performance gains on three different NLP tasks using code-switched text. We achieve substantial absolute improvements of 7.87%, 20.15%, and 10.99%, on the mean accuracies and F1 scores over previous state-of-the-art systems. We show consistent performance gains on four different code-switched language-pairs (Hindi-English, Spanish-English, Tamil-English and Malayalam-English) for SA.
arXiv Detail & Related papers (2021-07-21T08:10:59Z)
X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models [103.75890012041366]
Language models (LMs) have proven surprisingly successful at capturing factual knowledge. However, studies on LMs' factual representation ability have almost invariably been performed on English. We create a benchmark of cloze-style probes for 23 typologically diverse languages.
arXiv Detail & Related papers (2020-10-13T05:29:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.