Related papers: Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

URL: http://arxiv.org/abs/2402.12782v1
Date: Tue, 20 Feb 2024 07:47:39 GMT
Title: Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4
Authors: Angus Yang, Zehan Li, and Jie Li
Abstract summary: This study explores the best practices for utilizing GenAI as a programming tool. By evaluating prompting strategies at different levels of complexity, we identify that simplest and straightforward prompting strategy yields best code generation results. Our results reveal that while GPT-4 marginally outperforms GLM-4, the difference is minimal for average users.
Score: 5.986648786111719
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study aims to explore the best practices for utilizing GenAI as a programming tool, through a comparative analysis between GPT-4 and GLM-4. By evaluating prompting strategies at different levels of complexity, we identify that simplest and straightforward prompting strategy yields best code generation results. Additionally, adding a CoT-like preliminary confirmation step would further increase the success rate. Our results reveal that while GPT-4 marginally outperforms GLM-4, the difference is minimal for average users. In our simplified evaluation model, we see a remarkable 30 to 100-fold increase in code generation efficiency over traditional coding norms. Our GenAI Coding Workshop highlights the effectiveness and accessibility of the prompting methodology developed in this study. We observe that GenAI-assisted coding would trigger a paradigm shift in programming landscape, which necessitates developers to take on new roles revolving around supervising and guiding GenAI, and to focus more on setting high-level objectives and engaging more towards innovation.

Related papers

A case study on the transformative potential of AI in software engineering on LeetCode and ChatGPT [0.0]
This study employs a methodological approach, with the objective of comparing the software quality of Python programs produced by LeetCode users with that generated by GPT-4o. The findings indicate that GPT-4o does not present a considerable impediment to code quality, understandability, or runtime when generating code on a limited scale.
arXiv Detail & Related papers (2025-01-07T09:15:25Z)
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset. We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z)
Strategic Optimization and Challenges of Large Language Models in Object-Oriented Programming [0.0]
This research focuses on method-level code generation within the Object-Oriented Programming (OOP) framework. We devised experiments that varied the extent of contextual information in the prompts. Our findings indicate that prompts enriched with method invocation details yield the highest cost-effectiveness.
arXiv Detail & Related papers (2024-08-27T07:44:16Z)
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG) We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z)
Predicting Learning Performance with Large Language Models: A Study in Adult Literacy [18.48602704139462]
This study investigates the application of advanced AI models, including Large Language Models (LLMs), for predicting learning performance in adult literacy programs in ITSs. We evaluate the predictive capabilities of GPT-4 versus traditional machine learning methods in predicting learning performance through five-fold cross-validation techniques.
arXiv Detail & Related papers (2024-03-04T08:14:07Z)
Comparing large language models and human programmers for generating programming code [0.0]
GPT-4 substantially outperforms other large language models, including Gemini Ultra and Claude 2. In most LeetCode and GeeksforGeeks coding contests evaluated in this study, GPT-4 employing the optimal prompt strategy outperforms 85 percent of human participants.
arXiv Detail & Related papers (2024-03-01T14:43:06Z)
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies [47.129504708849446]
Large Language Models (LLMs) have revolutionized the field of Natural Language Processing. LLMs lack systematic generalization, which allows to extrapolate the learned statistical regularities outside the training distribution. In this work, we offer a systematic benchmarking of GPT-4, one of the most advanced LLMs available.
arXiv Detail & Related papers (2024-02-27T10:44:52Z)
Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution [92.84441068115517]
Investigate-Consolidate-Exploit (ICE) is a novel strategy for enhancing the adaptability and flexibility of AI agents. ICE promotes the transfer of knowledge between tasks for genuine self-evolution. Our experiments on the XAgent framework demonstrate ICE's effectiveness, reducing API calls by as much as 80%.
arXiv Detail & Related papers (2024-01-25T07:47:49Z)
Prompt Engineering or Fine Tuning: An Empirical Assessment of Large Language Models in Automated Software Engineering Tasks [8.223311621898983]
GPT-4 with conversational prompts showed drastic improvement compared to GPT-4 with automatic prompting strategies. fully automated prompt engineering with no human in the loop requires more study and improvement.
arXiv Detail & Related papers (2023-10-11T00:21:00Z)
A Reinforcement Learning-assisted Genetic Programming Algorithm for Team Formation Problem Considering Person-Job Matching [70.28786574064694]
A reinforcement learning-assisted genetic programming algorithm (RL-GP) is proposed to enhance the quality of solutions. The hyper-heuristic rules obtained through efficient learning can be utilized as decision-making aids when forming project teams.
arXiv Detail & Related papers (2023-04-08T14:32:12Z)
GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs. It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.