Assessing AI-Based Code Assistants in Method Generation Tasks
- URL: http://arxiv.org/abs/2402.09022v1
- Date: Wed, 14 Feb 2024 08:52:45 GMT
- Title: Assessing AI-Based Code Assistants in Method Generation Tasks
- Authors: Vincenzo Corso, Leonardo Mariani, Daniela Micucci and Oliviero
Riganelli
- Abstract summary: This study compares four AI-based code assistants, GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in method generation tasks.
Results show that code assistants are useful, with complementary capabilities, although they rarely generate ready-to-use correct code.
- Score: 5.32539007352208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI-based code assistants are increasingly popular as a means to enhance
productivity and improve code quality. This study compares four AI-based code
assistants, GitHub Copilot, Tabnine, ChatGPT, and Google Bard, in method
generation tasks, assessing their ability to produce accurate, correct, and
efficient code. Results show that code assistants are useful, with
complementary capabilities, although they rarely generate ready-to-use correct
code.
Related papers
- Using AI-Based Coding Assistants in Practice: State of Affairs, Perceptions, and Ways Forward [9.177785129949]
We carried out a large-scale survey aimed at how AI assistants are used.
We collected opinions of 481 programmers on five broad activities.
Our results show that usage of AI assistants varies depending on activity and stage.
arXiv Detail & Related papers (2024-06-11T23:10:43Z) - How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding.
Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z) - Genetic Auto-prompt Learning for Pre-trained Code Intelligence Language Models [54.58108387797138]
We investigate the effectiveness of prompt learning in code intelligence tasks.
Existing automatic prompt design methods are very limited to code intelligence tasks.
We propose Genetic Auto Prompt (GenAP) which utilizes an elaborate genetic algorithm to automatically design prompts.
arXiv Detail & Related papers (2024-03-20T13:37:00Z) - Whodunit: Classifying Code as Human Authored or GPT-4 Generated -- A
case study on CodeChef problems [0.13124513975412253]
We use code stylometry and machine learning to distinguish between GPT-4 generated and human-authored code.
Our dataset comprises human-authored solutions from CodeChef and AI-authored solutions generated by GPT-4.
Our study shows that code stylometry is a promising approach for distinguishing between GPT-4 generated code and human-authored code.
arXiv Detail & Related papers (2024-03-06T19:51:26Z) - Generating Java Methods: An Empirical Assessment of Four AI-Based Code
Assistants [5.32539007352208]
We assess the effectiveness of four popular AI-based code assistants, namely GitHub Copilot, Tabnine, ChatGPT, and Google Bard.
Results show that Copilot is often more accurate than other techniques, yet none of the assistants is completely subsumed by the rest of the approaches.
arXiv Detail & Related papers (2024-02-13T12:59:20Z) - AI for Low-Code for AI [8.379047663193422]
LowCoder is the first low-code tool for developing AI pipelines that supports both a visual programming interface and an AI-powered natural language interface.
We task 20 developers with varying levels of AI expertise with implementing four ML pipelines using LowCoder.
We find that LowCoder is especially useful for (i) Discoverability: using LowCoder_NL, participants discovered new operators in 75% of the tasks.
arXiv Detail & Related papers (2023-05-31T16:44:03Z) - Chatbots As Fluent Polyglots: Revisiting Breakthrough Code Snippets [0.0]
The research applies AI-driven code assistants to analyze a selection of influential computer code that has shaped modern technology.
The original contribution of this study was to examine half of the most significant code advances in the last 50 years.
arXiv Detail & Related papers (2023-01-05T23:17:17Z) - RMBench: Benchmarking Deep Reinforcement Learning for Robotic
Manipulator Control [47.61691569074207]
Reinforcement learning is applied to solve actual complex tasks from high-dimensional, sensory inputs.
Recent progress benefits from deep learning for raw sensory signal representation.
We present RMBench, the first benchmark for robotic manipulations.
arXiv Detail & Related papers (2022-10-20T13:34:26Z) - Enhancing Semantic Code Search with Multimodal Contrastive Learning and
Soft Data Augmentation [50.14232079160476]
We propose a new approach with multimodal contrastive learning and soft data augmentation for code search.
We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages.
arXiv Detail & Related papers (2022-04-07T08:49:27Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.