WRTester: Differential Testing of WebAssembly Runtimes via
Semantic-aware Binary Generation
- URL: http://arxiv.org/abs/2312.10456v1
- Date: Sat, 16 Dec 2023 14:02:42 GMT
- Title: WRTester: Differential Testing of WebAssembly Runtimes via
Semantic-aware Binary Generation
- Authors: Shangtong Cao, Ningyu He, Xinyu She, Yixuan Zhang, Mu Zhang, Haoyu
Wang
- Abstract summary: We present WRTester, a novel differential testing framework that can generated complicated Wasm test cases by disassembling and assembling real-world Wasm binaries.
For further pinpointing the root causes of unexpected behaviors, we design a runtime-agnostic root cause location method to accurately locate bugs.
We have uncovered 33 unique bugs in popular Wasm runtimes, among which 25 have been confirmed.
- Score: 19.78427170624683
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Wasm runtime is a fundamental component in the Wasm ecosystem, as it directly
impacts whether Wasm applications can be executed as expected. Bugs in Wasm
runtime bugs are frequently reported, thus our research community has made a
few attempts to design automated testing frameworks for detecting bugs in Wasm
runtimes. However, existing testing frameworks are limited by the quality of
test cases, i.e., they face challenges of generating both semantic-rich and
syntactic-correct Wasm binaries, thus complicated bugs cannot be triggered. In
this work, we present WRTester, a novel differential testing framework that can
generated complicated Wasm test cases by disassembling and assembling of
real-world Wasm binaries, which can trigger hidden inconsistencies among Wasm
runtimes. For further pinpointing the root causes of unexpected behaviors, we
design a runtime-agnostic root cause location method to accurately locate bugs.
Extensive evaluation suggests that WRTester outperforms SOTA techniques in
terms of both efficiency and effectiveness. We have uncovered 33 unique bugs in
popular Wasm runtimes, among which 25 have been confirmed.
Related papers
- Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution.
TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults.
Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub
Actions [8.508198765617196]
We present GitBug-Actions, a novel tool for building bug-fix benchmarks with modern and fully-reproducible bug-fixes.
GitBug-Actions relies on the most popular CI platform, GitHub Actions, to detect bug-fixes.
To demonstrate our toolchain, we deploy GitBug-Actions to build a proof-of-concept Go bug-fix benchmark.
arXiv Detail & Related papers (2023-10-24T09:04:14Z) - Automatic Generation of Test Cases based on Bug Reports: a Feasibility
Study with Large Language Models [4.318319522015101]
Existing approaches produce test cases that either can be qualified as simple (e.g. unit tests) or that require precise specifications.
Most testing procedures still rely on test cases written by humans to form test suites.
We investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs.
arXiv Detail & Related papers (2023-10-10T05:30:12Z) - Revealing Performance Issues in Server-side WebAssembly Runtimes via
Differential Testing [28.187405253760687]
We design a novel differential testing approach WarpDiff to identify performance issues in server-side Wasm runtimes.
We identify abnormal cases where the execution time ratio significantly deviates from the oracle ratio and locate the Wasm runtimes that cause the performance issues.
arXiv Detail & Related papers (2023-09-21T15:25:18Z) - PreciseBugCollector: Extensible, Executable and Precise Bug-fix
Collection [8.79879909193717]
We introduce PreciseBugCollector, a precise, multi-language bug collection approach.
It is based on two novel components: a bug tracker to map the repositories with external bug repositories to trace bug type information, and a bug injector to generate project-specific bugs.
To date, PreciseBugCollector comprises 1057818 bugs extracted from 2968 open-source projects.
arXiv Detail & Related papers (2023-09-12T13:47:44Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - On Distribution Shift in Learning-based Bug Detectors [4.511923587827301]
We train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution.
We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution.
arXiv Detail & Related papers (2022-04-21T12:17:22Z) - Detecting Rewards Deterioration in Episodic Reinforcement Learning [63.49923393311052]
In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible.
We consider an episodic framework, where the rewards within each episode are not independent, nor identically-distributed, nor Markov.
We define the mean-shift in a way corresponding to deterioration of a temporal signal (such as the rewards), and derive a test for this problem with optimal statistical power.
arXiv Detail & Related papers (2020-10-22T12:45:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.