Reinforcement learning guided fuzz testing for a browser's HTML
rendering engine
- URL: http://arxiv.org/abs/2307.14556v1
- Date: Thu, 27 Jul 2023 00:31:02 GMT
- Title: Reinforcement learning guided fuzz testing for a browser's HTML
rendering engine
- Authors: Martin Sablotny, Bj{\o}rn Sand Jensen, Jeremy Singer
- Abstract summary: We propose a novel approach to combine a trained test case generator deep learning model with a double deep Q-network.
The DDQN guides test case creation based on a code coverage signal.
Our approach improves the code coverage performance of the underlying generator model by up to 18.5% for the Firefox HTML rendering engine.
- Score: 0.9176056742068814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generation-based fuzz testing can uncover various bugs and security
vulnerabilities. However, compared to mutation-based fuzz testing, it takes
much longer to develop a well-balanced generator that produces good test cases
and decides where to break the underlying structure to exercise new code paths.
We propose a novel approach to combine a trained test case generator deep
learning model with a double deep Q-network (DDQN) for the first time. The DDQN
guides test case creation based on a code coverage signal. Our approach
improves the code coverage performance of the underlying generator model by up
to 18.5\% for the Firefox HTML rendering engine compared to the baseline
grammar based fuzzer.
Related papers
- Robust Black-box Testing of Deep Neural Networks using Co-Domain Coverage [18.355332126489756]
Rigorous testing of machine learning models is necessary for trustworthy deployments.
We present a novel black-box approach for generating test-suites for robust testing of deep neural networks (DNNs)
arXiv Detail & Related papers (2024-08-13T09:42:57Z) - SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents [10.730852617039451]
We investigate the capability of LLM-based Code Agents to formalize user issues into test cases.
We propose a novel benchmark based on popular GitHub repositories, containing real-world issues, ground-truth bug-fixes, and golden tests.
We find that LLMs generally perform surprisingly well at generating relevant test cases, with Code Agents designed for code repair exceeding the performance of systems designed for test generation.
arXiv Detail & Related papers (2024-06-18T14:54:37Z) - Data Augmentation by Fuzzing for Neural Test Generation [7.310817657037053]
We introduce a novel data augmentation technique, *FuzzAug*, that introduces the benefits of fuzzing to large language models.
Our evaluations show that models trained with dataset augmented by FuzzAug increase assertion accuracy by 5%, improve compilation rate by more than 10%, and generate unit test functions with 5% more branch coverage.
arXiv Detail & Related papers (2024-06-12T22:09:27Z) - Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach [66.51005288743153]
We investigate the legal and ethical issues of current neural code completion models.
We tailor a membership inference approach (termed CodeMI) that was originally crafted for classification tasks.
We evaluate the effectiveness of this adapted approach across a diverse array of neural code completion models.
arXiv Detail & Related papers (2024-04-22T15:54:53Z) - CovRL: Fuzzing JavaScript Engines with Coverage-Guided Reinforcement
Learning for LLM-based Mutation [2.5864634852960444]
This paper presents a novel technique called CovRL (Coverage-guided Reinforcement Learning) that combines Large Language Models (LLMs) with reinforcement learning from coverage feedback.
CovRL-Fuzz identifies 48 real-world security-related bugs in the latest JavaScript engines, including 39 previously unknown vulnerabilities and 11 CVEs.
arXiv Detail & Related papers (2024-02-19T15:30:40Z) - Zero-Shot Detection of Machine-Generated Codes [83.0342513054389]
This work proposes a training-free approach for the detection of LLMs-generated codes.
We find that existing training-based or zero-shot text detectors are ineffective in detecting code.
Our method exhibits robustness against revision attacks and generalizes well to Java codes.
arXiv Detail & Related papers (2023-10-08T10:08:21Z) - Fuzzing for CPS Mutation Testing [3.512722797771289]
We propose a mutation testing approach that leverages fuzz testing, which has proved effective with C and C++ software.
Our empirical evaluation shows that mutation testing based on fuzz testing kills a significantly higher proportion of live mutants than symbolic execution.
arXiv Detail & Related papers (2023-08-15T16:35:31Z) - Neural Embeddings for Web Testing [49.66745368789056]
Existing crawlers rely on app-specific, threshold-based, algorithms to assess state equivalence.
We propose WEBEMBED, a novel abstraction function based on neural network embeddings and threshold-free classifiers.
Our evaluation on nine web apps shows that WEBEMBED outperforms state-of-the-art techniques by detecting near-duplicates more accurately.
arXiv Detail & Related papers (2023-06-12T19:59:36Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - CodeT: Code Generation with Generated Tests [49.622590050797236]
We explore the use of pre-trained language models to automatically generate test cases.
CodeT executes the code solutions using the generated test cases, and then chooses the best solution.
We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks.
arXiv Detail & Related papers (2022-07-21T10:18:37Z) - Measuring Coding Challenge Competence With APPS [54.22600767666257]
We introduce APPS, a benchmark for code generation.
Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges.
Recent models such as GPT-Neo can pass approximately 15% of the test cases of introductory problems.
arXiv Detail & Related papers (2021-05-20T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.