The Devil Is in the Command Line: Associating the Compiler Flags With
the Binary and Build Metadata
- URL: http://arxiv.org/abs/2312.13463v1
- Date: Wed, 20 Dec 2023 22:27:32 GMT
- Title: The Devil Is in the Command Line: Associating the Compiler Flags With
the Binary and Build Metadata
- Authors: Gunnar Kudrjavets (University of Groningen), Aditya Kumar (Google),
Jeff Thomas (Meta Platforms, Inc.), Ayushi Rastogi (University of Groningen)
- Abstract summary: Defects caused by an undesired combination of compiler flags are common in nontrivial software projects.
queryable database of how the compiler compiled and linked the software system will help to detect defects earlier.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Engineers build large software systems for multiple architectures, operating
systems, and configurations. A set of inconsistent or missing compiler flags
generates code that catastrophically impacts the system's behavior. In the
authors' industry experience, defects caused by an undesired combination of
compiler flags are common in nontrivial software projects. We are unaware of
any build and CI/CD systems that track how the compiler produces a specific
binary in a structured manner. We postulate that a queryable database of how
the compiler compiled and linked the software system will help to detect
defects earlier and reduce the debugging time.
Related papers
- Evaluating the Capability of LLMs in Identifying Compilation Errors in Configurable Systems [1.2928804566606342]
This study evaluates the efficacy of Large Language Models (LLMs), specifically ChatGPT4, Le Chat Mistral and Gemini Advanced 1.5.
ChatGPT4 successfully identified most compilation errors in individual products.
Le Chat Mistral and Gemini Advanced 1.5 detected some of them.
arXiv Detail & Related papers (2024-07-26T21:07:21Z) - Towards Understanding the Bugs in Solidity Compiler [11.193701473232851]
This paper presents the first systematic study on 533 Solidity compiler bugs.
We examine their characteristics (including symptoms, root causes, and distribution) and their triggering test cases.
To study the limitations of Solidity compiler fuzzers, we evaluate three Solidity compiler fuzzers.
arXiv Detail & Related papers (2024-07-08T14:22:50Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - In industrial embedded software, are some compilation errors easier to localize and fix than others? [1.627308316856397]
We collect over 40000 builds from 4 projects from the product source code and categorized compilation errors into 14 error types.
We show that the five most common ones comprise 89 % of all compilation errors.
Our research also provides insights into the human effort required to fix the most common industrial compilation errors.
arXiv Detail & Related papers (2024-04-23T08:20:18Z) - Weak Memory Demands Model-based Compiler Testing [0.0]
A compiler bug arises if the behaviour of a compiled concurrent program, as allowed by its architecture memory model, is not a behaviour permitted by the source program under its source model.
We observe that processor implementations are increasingly exploiting the behaviour of relaxed architecture models.
arXiv Detail & Related papers (2024-01-12T15:50:32Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - CP-BCS: Binary Code Summarization Guided by Control Flow Graph and
Pseudo Code [79.87518649544405]
We present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS.
CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics.
arXiv Detail & Related papers (2023-10-24T14:20:39Z) - Guess & Sketch: Language Model Guided Transpilation [59.02147255276078]
Learned transpilation offers an alternative to manual re-writing and engineering efforts.
Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness.
Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence.
arXiv Detail & Related papers (2023-09-25T15:42:18Z) - Dcc --help: Generating Context-Aware Compiler Error Explanations with
Large Language Models [53.04357141450459]
dcc --help was deployed to our CS1 and CS2 courses, with 2,565 students using the tool over 64,000 times in ten weeks.
We found that the LLM-generated explanations were conceptually accurate in 90% of compile-time and 75% of run-time cases, but often disregarded the instruction not to provide solutions in code.
arXiv Detail & Related papers (2023-08-23T02:36:19Z) - Compilable Neural Code Generation with Compiler Feedback [43.97362484564799]
This paper proposes a three-stage pipeline for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination.
Experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 on average and from 70.3 to 96.2 in text-to-code generation, respectively.
arXiv Detail & Related papers (2022-03-10T03:15:17Z) - Improving type information inferred by decompilers with supervised
machine learning [0.0]
In software reverse engineering, decompilation is the process of recovering source code from binary files.
We build different classification models capable of inferring the high-level type returned by functions.
Our system is able to predict function return types with a 79.1% F1-measure, whereas the best decompiler obtains a 30% F1-measure.
arXiv Detail & Related papers (2021-01-19T11:45:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.