CodeComplex: A Time-Complexity Dataset for Bilingual Source Codes
- URL: http://arxiv.org/abs/2401.08719v1
- Date: Tue, 16 Jan 2024 06:54:44 GMT
- Title: CodeComplex: A Time-Complexity Dataset for Bilingual Source Codes
- Authors: Seung-Yeop Baik, Mingi Jeon, Joonghyuk Hahn, Jungin Kim, Yo-Sub Han,
Sang-Ki Ko
- Abstract summary: We introduce CodeComplex, a novel source code dataset where each code is manually annotated with a corresponding worst-case time complexity.
To the best of our knowledge, CodeComplex stands as the most extensive code dataset tailored for predicting complexity.
We present the outcomes of our experiments employing various baseline models, leveraging state-of-the-art neural models in code comprehension.
- Score: 6.169110187130671
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Analyzing the worst-case time complexity of a code is a crucial task in
computer science and software engineering for ensuring the efficiency,
reliability, and robustness of software systems. However, it is well-known that
the problem of determining the worst-case time complexity of a given code
written in general-purpose programming language is theoretically undecidable by
the famous Halting problem proven by Alan Turing. Thus, we move towards more
realistic scenarios where the inputs and outputs of a program exist. This
allows us to discern the correctness of given codes, challenging to analyze
their time complexity exhaustively. In response to this challenge, we introduce
CodeComplex, a novel source code dataset where each code is manually annotated
with a corresponding worst-case time complexity. CodeComplex comprises 4,900
Java codes and an equivalent number of Python codes, all sourced from
programming competitions and annotated with complexity labels by a panel of
algorithmic experts. To the best of our knowledge, CodeComplex stands as the
most extensive code dataset tailored for predicting complexity. Subsequently,
we present the outcomes of our experiments employing various baseline models,
leveraging state-of-the-art neural models in code comprehension like CodeBERT,
GraphCodeBERT, UniXcoder, PLBART, CodeT5, CodeT5+, and ChatGPT. We analyze how
the dataset impacts the model's learning in predicting time complexity.
Related papers
- Large Language Models Meet Symbolic Provers for Logical Reasoning Evaluation [24.081573908824353]
First-order logic (FOL) reasoning is pivotal for intelligent systems.
Existing benchmarks often rely on extensive human annotation or handcrafted templates.
We propose a novel framework called ProverGen that synergizes the generative strengths of Large Language Models with the rigor and precision of symbolic provers.
arXiv Detail & Related papers (2025-02-10T15:31:54Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.
While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.
We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Benchmarking Complex Instruction-Following with Multiple Constraints Composition [72.82640456309821]
How to evaluate the ability of complex instruction-following of large language models (LLMs) has become a critical research problem.
Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints.
We propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints.
arXiv Detail & Related papers (2024-07-04T14:50:45Z) - SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code
Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects.
We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - Can Large Language Models Understand Real-World Complex Instructions? [54.86632921036983]
Large language models (LLMs) can understand human instructions, but struggle with complex instructions.
Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions.
We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
arXiv Detail & Related papers (2023-09-17T04:18:39Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - TASTY: A Transformer based Approach to Space and Time complexity [0.4724825031148411]
Code based Language Models (LMs) have shown very promising results in the field of software engineering.
We create a labelled dataset of code snippets spanning multiple languages.
We propose to use LMs to find space complexities from code, and to the best of our knowledge, this is the first attempt to do so.
arXiv Detail & Related papers (2023-05-06T03:37:44Z) - Detecting Requirements Smells With Deep Learning: Experiences,
Challenges and Future Work [9.44316959798363]
This work aims to improve the previous work by creating a manually labeled dataset and using ensemble learning, Deep Learning (DL), and techniques such as word embeddings and transfer learning to overcome the generalization problem.
The current findings show that the dataset is unbalanced and which class examples should be added more.
arXiv Detail & Related papers (2021-08-06T12:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.