Go-Oracle: Automated Test Oracle for Go Concurrency Bugs
- URL: http://arxiv.org/abs/2412.08061v1
- Date: Wed, 11 Dec 2024 03:07:56 GMT
- Title: Go-Oracle: Automated Test Oracle for Go Concurrency Bugs
- Authors: Foivos Tsimpourlas, Chao Peng, Carlos Rosuero, Ping Yang, Ajitha Rajan,
- Abstract summary: Bugs have become a prevalent issue within the Go programming language.
Our work seeks to address the test oracle problem for Go programs, to automatically classify test executions as pass or fail.
We capture a comprehensive array of execution events using the native Go execution tracer.
We preprocess and encode these traces before training a transformer-based neural network to effectively classify the traces as either passing or failing.
- Score: 6.773048267569272
- License:
- Abstract: The Go programming language has gained significant traction for developing software, especially in various infrastructure systems. Nonetheless, concurrency bugs have become a prevalent issue within Go, presenting a unique challenge due to the language's dual concurrency mechanisms-communicating sequential processes and shared memory. Detecting concurrency bugs and accurately classifying program executions as pass or fail presents an immense challenge, even for domain experts. We conducted a survey with expert developers at Bytedance that confirmed this challenge. Our work seeks to address the test oracle problem for Go programs, to automatically classify test executions as pass or fail. This problem has not been investigated in the literature for Go programs owing to its distinctive programming model. Our approach involves collecting both passing and failing execution traces from various subject Go programs. We capture a comprehensive array of execution events using the native Go execution tracer. Subsequently, we preprocess and encode these traces before training a transformer-based neural network to effectively classify the traces as either passing or failing. The evaluation of our approach encompasses 8 subject programs sourced from the GoBench repository. These subject programs are routinely used as benchmarks in an industry setting. Encouragingly, our test oracle, Go-Oracle, demonstrates high accuracies even when operating with a limited dataset, showcasing the efficacy and potential of our methodology. Developers at Bytedance strongly agreed that they would use the Go-Oracle tool over the current practice of manual inspections to classify tests for Go programs as pass or fail.
Related papers
- Revisit Self-Debugging with Self-Generated Tests for Code Generation [18.643472696246686]
Self-ging with self-generated tests is a promising solution but lacks a full exploration of its limitations and practical potential.
We propose two paradigms for the process: post-execution and in-execution self-ging.
We find that post-execution self-ging struggles on basic problems but shows potential for improvement on competitive ones, due to the bias introduced by self-generated tests.
arXiv Detail & Related papers (2025-01-22T10:54:19Z) - Effective Technical Reviews [0.7212939068975619]
While executing a program is the ultimate test for its correctness reviewing the program can occur earlier in its development and find problems if done effectively.
This work focuses on review techniques. It enables the programmer to effectively review a program and find a range of problems from to interface issues.
arXiv Detail & Related papers (2024-07-02T15:19:52Z) - Test Oracle Automation in the era of LLMs [52.69509240442899]
Large Language Models (LLMs) have demonstrated remarkable proficiency in tackling diverse software testing tasks.
This paper aims to enable discussions on the potential of using LLMs for test oracle automation, along with the challenges that may emerge during the generation of various types of oracles.
arXiv Detail & Related papers (2024-05-21T13:19:10Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Modelling Concurrency Bugs Using Machine Learning [0.0]
This project aims to compare both common and recent machine learning approaches.
We define a synthetic dataset that we generate with the scope of simulating real-life (concurrent) programs.
We formulate hypotheses about fundamental limits of various machine learning model types.
arXiv Detail & Related papers (2023-05-08T17:30:24Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - HyperPUT: Generating Synthetic Faulty Programs to Challenge Bug-Finding
Tools [3.8520163964103835]
We propose a complementary approach that automatically generates programs with seeded bugs.
Our technique, called HyperPUT, builds C programs from a "seed" bug by incrementally applying program transformations.
arXiv Detail & Related papers (2022-09-14T13:09:41Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - Fault-Aware Neural Code Rankers [64.41888054066861]
We propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it.
Our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models.
arXiv Detail & Related papers (2022-06-04T22:01:05Z) - Static Prediction of Runtime Errors by Learning to Execute Programs with
External Resource Descriptions [31.46148643917194]
We introduce a real-world dataset and task for predicting runtime errors.
We develop an interpreter-inspired architecture with an inductive bias towards mimicking program executions.
We show that the model can also predict the location of the error, despite being trained only on labels indicating the presence/absence and kind of error.
arXiv Detail & Related papers (2022-03-07T23:17:17Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.