Related papers: The Behavior of Large Language Models When Prompted to Generate Code Explanations

The Behavior of Large Language Models When Prompted to Generate Code Explanations

URL: http://arxiv.org/abs/2311.01490v2
Date: Thu, 9 Nov 2023 20:56:15 GMT
Title: The Behavior of Large Language Models When Prompted to Generate Code Explanations
Authors: Priti Oli, Rabin Banjade, Jeevan Chapagain, Vasile Rus
Abstract summary: This paper systematically investigates the generation of code explanations by Large Language Models (LLMs) Our findings reveal significant variations in the nature of code explanations produced by LLMs, influenced by factors such as the wording of the prompt. A consistent pattern emerges for Java and Python, where explanations exhibit a Flesch-Kincaid readability level of approximately 7-8 grade.
Score: 0.3293989832773954
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper systematically investigates the generation of code explanations by Large Language Models (LLMs) for code examples commonly encountered in introductory programming courses. Our findings reveal significant variations in the nature of code explanations produced by LLMs, influenced by factors such as the wording of the prompt, the specific code examples under consideration, the programming language involved, the temperature parameter, and the version of the LLM. However, a consistent pattern emerges for Java and Python, where explanations exhibit a Flesch-Kincaid readability level of approximately 7-8 grade and a consistent lexical density, indicating the proportion of meaningful words relative to the total explanation size. Additionally, the generated explanations consistently achieve high scores for correctness, but lower scores on three other metrics: completeness, conciseness, and specificity.

Related papers

On Explaining (Large) Language Models For Code Using Global Code-Based Explanations [45.126233498200534]
Language Models for Code (LLM4Code) have significantly changed the landscape of software engineering (SE) We introduce code rationales (Code$Q$), a technique with rigorous mathematical underpinning, to identify subsets of tokens that can explain individual code predictions. Our evaluation demonstrates that Code$Q$ is a powerful interpretability method to explain how (less) meaningful input concepts (i.e., natural language particle at') highly impact output generation.
arXiv Detail & Related papers (2025-03-21T01:00:45Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs) We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
The First Prompt Counts the Most! An Evaluation of Large Language Models on Iterative Example-based Code Generation [33.77058239791512]
This paper presents the first comprehensive study on example-based code generation using Large Language Models (LLMs) To address the incorrectness caused by the incompleteness of I/O examples, we adopt an iterative evaluation framework. We assess six state-of-the-art LLMs using a new benchmark of 168 diverse target functionalities.
arXiv Detail & Related papers (2024-11-11T08:05:37Z)
Source Code Summarization in the Era of Large Language Models [23.715005053430957]
Large language models (LLMs) have led to a great boost in the performance of code-related tasks. In this paper, we undertake a systematic and comprehensive study on code summarization in the era of LLMs.
arXiv Detail & Related papers (2024-07-09T05:48:42Z)
Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs [65.2379940117181]
We introduce code prompting, a chain of prompts that transforms a natural language problem into code. We find that code prompting exhibits a high-performance boost for multiple LLMs. Our analysis of GPT 3.5 reveals that the code formatting of the input problem is essential for performance improvement.
arXiv Detail & Related papers (2024-01-18T15:32:24Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
Language Agnostic Code Embeddings [61.84835551549612]
We focus on the cross-lingual capabilities of code embeddings across different programming languages. Code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details. We show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks.
arXiv Detail & Related papers (2023-10-25T17:34:52Z)
Exploring Large Language Models for Code Explanation [3.2570216147409514]
Large Language Models (LLMs) have made remarkable strides in Natural Language Processing. This study specifically delves into the task of generating natural-language summaries for code snippets, using various LLMs.
arXiv Detail & Related papers (2023-10-25T14:38:40Z)
Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach [12.214585409361126]
Large language models (LLMs)- based code generation is a complex and powerful black-box model. We propose a novel causal graph-based representation of the prompt and the generated code. We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies.
arXiv Detail & Related papers (2023-10-10T14:56:26Z)
Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial. We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z)
Comparing Code Explanations Created by Students and Large Language Models [4.526618922750769]
Reasoning about code and explaining its purpose are fundamental skills for computer scientists. The ability to describe at a high-level of abstraction how code will behave over all possible inputs correlates strongly with code writing skills. Existing pedagogical approaches that scaffold the ability to explain code, such as producing code explanations on demand, do not currently scale well to large classrooms.
arXiv Detail & Related papers (2023-04-08T06:52:54Z)
Python Code Generation by Asking Clarification Questions [57.63906360576212]
In this work, we introduce a novel and more realistic setup for this task. We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions. We collect and introduce a new dataset named CodeClarQA containing pairs of natural language descriptions and code with created synthetic clarification questions and answers.
arXiv Detail & Related papers (2022-12-19T22:08:36Z)
Language Models of Code are Few-Shot Commonsense Learners [106.1531522893209]
Given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. Existing approaches serialize the output graph as a flat list of nodes and edges. We show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language.
arXiv Detail & Related papers (2022-10-13T16:09:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.