Related papers: Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations

Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations

URL: http://arxiv.org/abs/2412.12471v1
Date: Tue, 17 Dec 2024 02:11:53 GMT
Title: Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations
Authors: Ruchit Rawal, Victor-Alexandru Pădurean, Sven Apel, Adish Singla, Mariya Toneva,
Abstract summary: We find that the program representation has a significant influence on the users' accuracy at finding and fixing bugs.<n>Different hints help differently depending on the program representation and the user's understanding of the algorithmic task.<n>These findings have implications for designing next-generation programming tools that provide personalized support to users.
Score: 28.829745991874816
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: With the recent advances in AI programming assistants such as GitHub Copilot, programming is not limited to classical programming languages anymore--programming tasks can also be expressed and solved by end-users in natural text. Despite the availability of this new programming modality, users still face difficulties with algorithmic understanding and program debugging. One promising approach to support end-users is to provide hints to help them find and fix bugs while forming and improving their programming capabilities. While it is plausible that hints can help, it is unclear which type of hint is helpful and how this depends on program representations (classic source code or a textual representation) and the user's capability of understanding the algorithmic task. To understand the role of hints in this space, we conduct a large-scale crowd-sourced study involving 753 participants investigating the effect of three types of hints (test cases, conceptual, and detailed), across two program representations (Python and text-based), and two groups of users (with clear understanding or confusion about the algorithmic task). We find that the program representation (Python vs. text) has a significant influence on the users' accuracy at finding and fixing bugs. Surprisingly, users are more accurate at finding and fixing bugs when they see the program in natural text. Hints are generally helpful in improving accuracy, but different hints help differently depending on the program representation and the user's understanding of the algorithmic task. These findings have implications for designing next-generation programming tools that provide personalized support to users, for example, by adapting the programming modality and providing hints with respect to the user's skill level and understanding.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery [61.02102713094486]
Good interpretation is important in scientific reasoning, as it allows for better decision-making. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data.
arXiv Detail & Related papers (2025-02-14T10:26:14Z)
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step. V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy. Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z)
GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements [0.0]
Before a function, programmers are encouraged to write a purpose statement i.e., a short, natural-language explanation of what the function computes. A purpose statement may be ambiguous i.e., it may fail to specify the intended behaviour when two or more inequivalent computations are plausible on certain inputs. We propose a novel that suggests such inputs using Large Language Models (LLMs) We create an open-source implementation of our dataset as an extension to Visual Studio Code for the Python programming language.
arXiv Detail & Related papers (2023-12-13T14:56:42Z)
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z)
Which is a better programming assistant? A comparative study between chatgpt and stack overflow [10.861651344753591]
We compare the performance of Stack Overflow and ChatGPT in enhancing programmer productivity. Concerning code quality, ChatGPT outperforms Stack Overflow significantly in helping complete algorithmic and library-related tasks. We identify the reasons behind the two platforms' divergent performances in programming assistance.
arXiv Detail & Related papers (2023-08-26T11:25:18Z)
Understanding Programs by Exploiting (Fuzzing) Test Cases [26.8259045248779]
We propose to incorporate the relationship between inputs and possible outputs/behaviors into learning, for achieving a deeper semantic understanding of programs. To obtain inputs that are representative enough to trigger the execution of most part of the code, we resort to fuzz testing and propose fuzz tuning. The effectiveness of the proposed method is verified on two program understanding tasks including code clone detection and code classification, and it outperforms current state-of-the-arts by large margins.
arXiv Detail & Related papers (2023-05-23T01:51:46Z)
Generation Probabilities Are Not Enough: Uncertainty Highlighting in AI Code Completions [54.55334589363247]
We study whether conveying information about uncertainty enables programmers to more quickly and accurately produce code. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits.
arXiv Detail & Related papers (2023-02-14T18:43:34Z)
Giving Feedback on Interactive Student Programs with Meta-Exploration [74.5597783609281]
Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science. Standard approaches require instructors to manually grade student-implemented interactive programs. Online platforms that serve millions, like Code.org, are unable to provide any feedback on assignments for implementing interactive programs.
arXiv Detail & Related papers (2022-11-16T10:00:23Z)
Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic? [5.714553194279462]
We investigate the various input parameters of two language models, and conduct a study to understand if variations of these input parameters can have a significant impact on the quality of the generated programs. Our results showed that varying the input parameters can significantly improve the performance of language models.
arXiv Detail & Related papers (2022-10-26T13:28:14Z)
Using Large Language Models to Enhance Programming Error Messages [5.903720638984496]
Large language models can be used to create useful enhancements to programming error messages. We discuss the benefits and downsides of large language models and highlight future streams of research for enhancing programming error messages.
arXiv Detail & Related papers (2022-10-20T23:17:26Z)
What is it like to program with artificial intelligence? [10.343988028594612]
Large language models can generate code to solve a variety of problems expressed in natural language. This technology has already been commercialised in at least one widely-used programming editor extension: GitHub Copilot. We explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance.
arXiv Detail & Related papers (2022-08-12T10:48:46Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.