Related papers: ChatGPT vs. DeepSeek: A Comparative Study on AI-Based Code Generation

ChatGPT vs. DeepSeek: A Comparative Study on AI-Based Code Generation

URL: http://arxiv.org/abs/2502.18467v1
Date: Thu, 30 Jan 2025 16:14:48 GMT
Title: ChatGPT vs. DeepSeek: A Comparative Study on AI-Based Code Generation
Authors: Md Motaleb Hossen Manik,
Abstract summary: This research compares ChatGPT and DeepSeek for Python code generation using online judge coding challenges.<n>It evaluates correctness (online judge verdicts, up to three attempts), code quality (Pylint/Flake8), and efficiency (execution time/memory usage)<n>DeepSeek demonstrated higher correctness, particularly on algorithmic tasks, often achieving 'Accepted' on the first attempt.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Background: AI-powered code generation, fueled by Large Language Models (LLMs), is revolutionizing software development. Models like OpenAI's Codex and GPT-4, alongside DeepSeek, leverage vast code and natural language datasets. However, ensuring code quality, correctness, and managing complex tasks remains challenging, necessitating thorough evaluation. Methodology: This research compares ChatGPT (version o1) and DeepSeek (version R1) for Python code generation using online judge coding challenges. It evaluates correctness (online judge verdicts, up to three attempts), code quality (Pylint/Flake8), and efficiency (execution time/memory usage). Results: DeepSeek demonstrated higher correctness, particularly on algorithmic tasks, often achieving 'Accepted' on the first attempt. ChatGPT sometimes requires multiple attempts or failures. ChatGPT encountered fewer issues, used comparable or slightly less memory, consumed less execution times and wrote fewer lines of code. Conclusion: DeepSeek exhibited superior correctness in Python code generation, often requiring fewer attempts, suggesting an advantage in algorithmic problem-solving. Both models showed almost similar efficiency in execution time and memory use. Finally, this research provides insights for developers choosing AI coding assistants and informs future AI-driven software development research.

Related papers

KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding [49.56049319037421]
KodCode is a synthetic dataset that addresses the persistent challenge of acquiring high-quality, verifiable training data. It comprises question-solution-test triplets that are systematically validated via a self-verification procedure. This pipeline yields a large-scale, robust and diverse coding dataset.
arXiv Detail & Related papers (2025-03-04T19:17:36Z)
AIGCodeSet: A New Annotated Dataset for AI Generated Code Detection [0.0]
We present AIGCodeSet, which consists of 2.828 AI-generated and 4.755 human-written Python codes. Our experiments show that a Bayesian classifier outperforms the other models.
arXiv Detail & Related papers (2024-12-21T11:53:49Z)
Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation [0.24578723416255752]
We evaluate five different large language models (LLMs) concerning their capabilities for text-to-code generation. ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama.
arXiv Detail & Related papers (2024-09-06T10:03:49Z)
Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust [0.1906498126334485]
This study evaluates the capabilities of ChatGPT versions 3.5 and 4 in generating code across a diverse range of programming languages. We asked ChatGPT to generate three distinct codes: a simple numerical integration, a conjugate gradient solver, and a parallel 1D stencil-based heat equation solver. The focus of our analysis was on the compilation, runtime performance, and accuracy of the codes.
arXiv Detail & Related papers (2024-05-21T17:04:37Z)
Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures [0.6990493129893112]
We evaluate ChatGPT's ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. We look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations.
arXiv Detail & Related papers (2023-07-10T08:20:34Z)
CONCORD: Clone-aware Contrastive Learning for Source Code [64.51161487524436]
Self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE tasks. We argue that it is also essential to factor in how developers code day-to-day for general-purpose representation learning. In particular, we propose CONCORD, a self-supervised, contrastive learning strategy to place benign clones closer in the representation space while moving deviants further apart.
arXiv Detail & Related papers (2023-06-05T20:39:08Z)
ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle Verifiers [60.6418431624873]
Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems. We propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness. Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT.
arXiv Detail & Related papers (2023-05-24T00:10:15Z)
Revisiting Code Search in a Two-Stage Paradigm [67.02322603435628]
TOSS is a two-stage fusion code search framework. It first uses IR-based and bi-encoder models to efficiently recall a small number of top-k code candidates. It then uses fine-grained cross-encoders for finer ranking.
arXiv Detail & Related papers (2022-08-24T02:34:27Z)
Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z)
Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
COSEA: Convolutional Code Search with Layer-wise Attention [90.35777733464354]
We propose a new deep learning architecture, COSEA, which leverages convolutional neural networks with layer-wise attention to capture the code's intrinsic structural logic. COSEA can achieve significant improvements over state-of-the-art methods on code search tasks.
arXiv Detail & Related papers (2020-10-19T13:53:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.