Mutation-Based Deep Learning Framework Testing Method in JavaScript Environment
- URL: http://arxiv.org/abs/2409.14968v1
- Date: Mon, 23 Sep 2024 12:37:56 GMT
- Title: Mutation-Based Deep Learning Framework Testing Method in JavaScript Environment
- Authors: Yinglong Zou, Juan Zhai, Chunrong Fang, Jiawei Liu, Tao Zheng, Zhenyu Chen,
- Abstract summary: We propose a mutation-based JavaScript DL framework testing method named DLJSFuzzer.
DLJSFuzzer successfully detects 21 unique crashes and unique 126 NaN & Inconsistency bugs.
DLJSFuzzer has improved by over 47% in model generation efficiency and over 91% in bug detection efficiency compared to all baselines.
- Score: 16.67312523556796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Deep Learning (DL) applications in JavaScript environment have become increasingly popular. As the infrastructure for DL applications, JavaScript DL frameworks play a crucial role in the development and deployment. It is essential to ensure the quality of JavaScript DL frameworks. However, the bottleneck of limited computational resources in the JavaScript environment brings new challenges to framework testing. Specifically, JavaScript DL frameworks are equipped with various optimization mechanisms (e.g., cache reuse, inference acceleration) to overcome the bottleneck of limited computational resources. These optimization mechanisms are overlooked by existing methods, resulting in many bugs in JavaScript DL frameworks being missed. To address the above challenges, we propose a mutation-based JavaScript DL framework testing method named DLJSFuzzer. DLJSFuzzer designs 13 tensor mutation rules targeting the cache reuse mechanism to generate test input tensors. Besides, DLJSFuzzer designs eight model mutation rules targeting the inference acceleration mechanism to generate test input models. To evaluate the effectiveness of DLJSFuzzer, we conduct experiments on the most widely-used JavaScript DL framework, TensorFlow.js. The experimental results show that DLJSFuzzer outperforms state-of-the-art methods in both effectiveness and efficiency. DLJSFuzzer successfully detects 21 unique crashes and 126 unique NaN & Inconsistency bugs. All detected crashes have been reported to the open-source community, with 12 of them already confirmed by developers. Additionally, DLJSFuzzer has improved by over 47% in model generation efficiency and over 91% in bug detection efficiency compared to all baselines.
Related papers
- A Tale of Two DL Cities: When Library Tests Meet Compiler [12.751626834965231]
We propose OPERA to extract domain knowledge from the test inputs for DL libraries.
OPERA constructs diverse tests from the various test inputs for DL libraries.
It incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs.
arXiv Detail & Related papers (2024-07-23T16:35:45Z) - FV8: A Forced Execution JavaScript Engine for Detecting Evasive Techniques [53.288368877654705]
FV8 is a modified V8 JavaScript engine designed to identify evasion techniques in JavaScript code.
It selectively enforces code execution on APIs that conditionally inject dynamic code.
It identifies 1,443 npm packages and 164 (82%) extensions containing at least one type of evasion.
arXiv Detail & Related papers (2024-05-21T19:54:19Z) - MoCo: Fuzzing Deep Learning Libraries via Assembling Code [13.937180393991616]
Deep learning techniques have been applied in software systems with various application scenarios.
DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts.
We propose MoCo, a novel fuzzing testing method for DL libraries via assembling code.
arXiv Detail & Related papers (2024-05-13T13:40:55Z) - JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content.
evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address.
JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z) - Theoretically Achieving Continuous Representation of Oriented Bounding Boxes [64.15627958879053]
This paper endeavors to completely solve the issue of discontinuity in Oriented Bounding Box representation.
We propose a novel representation method called Continuous OBB (COBB) which can be readily integrated into existing detectors.
For fairness and transparency of experiments, we have developed a modularized benchmark based on the open-source deep learning framework Jittor's detection toolbox JDet for OOD evaluation.
arXiv Detail & Related papers (2024-02-29T09:27:40Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection [51.15122099046214]
WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations.
The detection of JavaScript-WebAssembly multilingual malware (JWMM) is challenging due to the complex interoperations and semantic diversity between JavaScript and WebAssembly.
We present JWBinder, the first technique aimed at enhancing the static detection of JWMM.
arXiv Detail & Related papers (2023-10-26T10:59:45Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - MEMO: Coverage-guided Model Generation For Deep Learning Library Testing [11.263121366956726]
A few techniques have been proposed to test deep learning (DL) libraries by generating DL models as test inputs.
But the test effectiveness of these techniques is constrained by the diversity of generated DL models.
We propose MEMO to efficiently generate diverse DL models by exploring layer types, layer pairs, and layer parameters.
arXiv Detail & Related papers (2022-08-02T14:53:02Z) - Toward Understanding Deep Learning Framework Bugs [6.38591361654859]
We conduct a large-scale study on 1,000 bugs from four popular and diverse DL frameworks.
We obtain 12 major findings for the comprehensive understanding of DL framework bugs and the current status of existing DL framework testing practice.
Based on the guidelines, we design and implement a prototype DL-framework testing tool, called TenFuzz.
arXiv Detail & Related papers (2022-03-08T11:40:23Z) - Montage: A Neural Network Language Model-Guided JavaScript Engine Fuzzer [18.908548472588976]
We present Montage, the first NNLM-guided fuzzer for finding JS engine vulnerabilities.
Montage found 37 real-world bugs, including three CVEs, in the latest JS engines.
arXiv Detail & Related papers (2020-01-13T08:45:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.