Related papers: Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language

Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language

URL: http://arxiv.org/abs/2210.15157v1
Date: Thu, 27 Oct 2022 03:48:24 GMT
Title: Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language
Authors: Paul Denny and Viraj Kumar and Nasser Giacaman
Abstract summary: GitHub Copilot is an artificial intelligence model for automatically generating source code from natural language problem descriptions. Since June 2022, Copilot has officially been available for free to all students as a plug-in to development environments like Visual Studio Code.
Score: 3.155277175705079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: GitHub Copilot is an artificial intelligence model for automatically generating source code from natural language problem descriptions. Since June 2022, Copilot has officially been available for free to all students as a plug-in to development environments like Visual Studio Code. Prior work exploring OpenAI Codex, the underlying model that powers Copilot, has shown it performs well on typical CS1 problems thus raising concerns about the impact it will have on how introductory programming courses are taught. However, little is known about the types of problems for which Copilot does not perform well, or about the natural language interactions that a student might have with Copilot when resolving errors. We explore these questions by evaluating the performance of Copilot on a publicly available dataset of 166 programming problems. We find that it successfully solves around half of these problems on its very first attempt, and that it solves 60\% of the remaining problems using only natural language changes to the problem description. We argue that this type of prompt engineering, which we believe will become a standard interaction between human and Copilot when it initially fails, is a potentially useful learning activity that promotes computational thinking skills, and is likely to change the nature of code writing skill development.

Related papers

Fundamental Risks in the Current Deployment of General-Purpose AI Models: What Have We (Not) Learnt From Cybersecurity? [60.629883024152576]
Large Language Models (LLMs) have seen rapid deployment in a wide range of use cases. OpenAIs Altera are just a few examples of increased autonomy, data access, and execution capabilities. These methods come with a range of cybersecurity challenges.
arXiv Detail & Related papers (2024-12-19T14:44:41Z)
Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot [46.822148186169144]
GitHub Copilot is an AI-enabled tool that automates program synthesis. Recent studies have extensively examined Copilot's capabilities in various programming tasks. However, little is known about the effect of different natural languages on code suggestion.
arXiv Detail & Related papers (2024-02-02T14:30:02Z)
Exploring the Problems, their Causes and Solutions of AI Pair Programming: A Study on GitHub and Stack Overflow [6.724815667295355]
GitHub Copilot, the AI programmer pair, utilize machine learning models trained on a large corpus of code snippets to generate code suggestions. Despite its popularity in software development, there is limited empirical evidence on the actual experiences of practitioners who work with Copilot. We collected data from 473 GitHub issues, 706 GitHub discussions, and 142 Stack Overflow posts.
arXiv Detail & Related papers (2023-11-02T06:24:38Z)
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [80.52201658231895]
SWE-bench is an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories. We show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues.
arXiv Detail & Related papers (2023-10-10T16:47:29Z)
Demystifying Practices, Challenges and Expected Features of Using GitHub Copilot [3.655281304961642]
We conducted an empirical study by collecting and analyzing the data from Stack Overflow (SO) and GitHub Discussions. We identified the programming languages, technologies used with Copilot, functions implemented, benefits, limitations, and challenges when using Copilot. Our results suggest that using Copilot is like a double-edged sword, which requires developers to carefully consider various aspects when deciding whether or not to use it.
arXiv Detail & Related papers (2023-09-11T16:39:37Z)
Giving Feedback on Interactive Student Programs with Meta-Exploration [74.5597783609281]
Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science. Standard approaches require instructors to manually grade student-implemented interactive programs. Online platforms that serve millions, like Code.org, are unable to provide any feedback on assignments for implementing interactive programs.
arXiv Detail & Related papers (2022-11-16T10:00:23Z)
Human-guided Collaborative Problem Solving: A Natural Language based Framework [74.27063862727849]
Our framework consists of three components -- a natural language engine that parses the language utterances to a formal representation and vice-versa. We illustrate the ability of this framework to address the key challenges of collaborative problem solving by demonstrating it on a collaborative building task in a Minecraft-based blocksworld domain.
arXiv Detail & Related papers (2022-07-19T21:52:37Z)
GitHub Copilot AI pair programmer: Asset or Liability? [14.572381978575182]
We study the capabilities of Copilot in two different programming tasks. We compare Copilot's proposed solutions with those of human programmers on a set of programming tasks. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems.
arXiv Detail & Related papers (2022-06-30T15:00:03Z)
Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z)
An Empirical Cybersecurity Evaluation of GitHub Copilot's Code Contributions [8.285068188878578]
GitHub Copilot is a language model trained over open-source GitHub code. Code often contains bugs - and so, it is certain that the language model will have learned from exploitable, buggy code. This raises concerns on the security of Copilot's code contributions.
arXiv Detail & Related papers (2021-08-20T17:30:33Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.