Related papers: Evaluating the Capability of LLMs in Identifying Compilation Errors in Configurable Systems

Evaluating the Capability of LLMs in Identifying Compilation Errors in Configurable Systems

URL: http://arxiv.org/abs/2407.19087v2
Date: Tue, 30 Jul 2024 13:36:55 GMT
Title: Evaluating the Capability of LLMs in Identifying Compilation Errors in Configurable Systems
Authors: Lucas Albuquerque, Rohit Gheyi, Márcio Ribeiro,
Abstract summary: This study evaluates the efficacy of Large Language Models (LLMs), specifically ChatGPT4, Le Chat Mistral and Gemini Advanced 1.5. ChatGPT4 successfully identified most compilation errors in individual products. Le Chat Mistral and Gemini Advanced 1.5 detected some of them.
Score: 1.2928804566606342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Compilation is an important process in developing configurable systems, such as Linux. However, identifying compilation errors in configurable systems is not straightforward because traditional compilers are not variability-aware. Previous approaches that detect some of these compilation errors often rely on advanced techniques that require significant effort from programmers. This study evaluates the efficacy of Large Language Models (LLMs), specifically ChatGPT4, Le Chat Mistral and Gemini Advanced 1.5, in identifying compilation errors in configurable systems. Initially, we evaluate 50 small products in C++, Java, and C languages, followed by 30 small configurable systems in C, covering 17 different types of compilation errors. ChatGPT4 successfully identified most compilation errors in individual products and in configurable systems, while Le Chat Mistral and Gemini Advanced 1.5 detected some of them. LLMs have shown potential in assisting developers in identifying compilation errors in configurable systems.

Related papers

Galapagos: Automated N-Version Programming with LLMs [10.573037638807024]
We propose the automated generation of program variants using large language models. We design, develop and evaluate Gal'apagos: a tool for generating program variants. We evaluate Gal'apagos by creating N-Version components of real-world C code.
arXiv Detail & Related papers (2024-08-18T16:44:01Z)
Understanding Misconfigurations in ROS: An Empirical Study and Current Approaches [1.3124513975412255]
The Robot Operating System (ROS) is a popular framework and ecosystem that allows developers to build robot software systems from reusable, off-the-shelf components. While reusable components theoretically allow rapid prototyping, ensuring proper configuration and connection is challenging. We perform a study of ROS Answers, a Q&A platform, to identify and categorize misconfigurations that occur during ROS development.
arXiv Detail & Related papers (2024-07-27T16:20:43Z)
KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z)
C-LLM: Learn to Check Chinese Spelling Errors Character by Character [61.53865964535705]
We propose C-LLM, a Large Language Model-based Chinese Spell Checking method that learns to check errors Character by Character. C-LLM achieves an average improvement of 10% over existing methods.
arXiv Detail & Related papers (2024-06-24T11:16:31Z)
Exploring Multi-Lingual Bias of Large Code Models in Code Generation [55.336629780101475]
Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications. Despite the effectiveness, we observe a noticeable multilingual bias in the generation performance of large code models (LCMs) LCMs demonstrate proficiency in generating solutions when provided with instructions in English, yet may falter when faced with semantically equivalent instructions in other NLs such as Chinese.
arXiv Detail & Related papers (2024-04-30T08:51:49Z)
In industrial embedded software, are some compilation errors easier to localize and fix than others? [1.627308316856397]
We collect over 40000 builds from 4 projects from the product source code and categorized compilation errors into 14 error types. We show that the five most common ones comprise 89 % of all compilation errors. Our research also provides insights into the human effort required to fix the most common industrial compilation errors.
arXiv Detail & Related papers (2024-04-23T08:20:18Z)
DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs) It covers four major bug categories and 18 minor types in C++, Java, and Python. We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z)
The Devil Is in the Command Line: Associating the Compiler Flags With the Binary and Build Metadata [0.0]
Defects caused by an undesired combination of compiler flags are common in nontrivial software projects. queryable database of how the compiler compiled and linked the software system will help to detect defects earlier.
arXiv Detail & Related papers (2023-12-20T22:27:32Z)
Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z)
Dcc --help: Generating Context-Aware Compiler Error Explanations with Large Language Models [53.04357141450459]
dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks. We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
arXiv Detail & Related papers (2023-08-23T02:36:19Z)
Isolating Compiler Bugs by Generating Effective Witness Programs with Large Language Models [10.660543763757518]
Existing compiler bug isolation approaches convert the problem into a test program mutation problem. We propose a new approach named LLM4CBI to utilize LLMs to generate effective test programs for compiler bug isolation. Compared with state-of-the-art approaches over 120 real bugs from GCC and LLVM, our evaluation demonstrates the advantages of LLM4CBI.
arXiv Detail & Related papers (2023-07-02T15:20:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.