Python Fuzzing for Trustworthy Machine Learning Frameworks
- URL: http://arxiv.org/abs/2403.12723v1
- Date: Tue, 19 Mar 2024 13:41:11 GMT
- Title: Python Fuzzing for Trustworthy Machine Learning Frameworks
- Authors: Ilya Yegorov, Eli Kobrin, Darya Parygina, Alexey Vishnyakov, Andrey Fedotov,
- Abstract summary: We propose a dynamic analysis pipeline for Python projects using Sydr-Fuzz.
Our pipeline includes fuzzing, corpus minimization, crash triaging, and coverage collection.
To identify the most vulnerable parts of machine learning frameworks, we analyze their potential attack surfaces and develop fuzz targets for PyTorch, and related projects such as h5py.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Ensuring the security and reliability of machine learning frameworks is crucial for building trustworthy AI-based systems. Fuzzing, a popular technique in secure software development lifecycle (SSDLC), can be used to develop secure and robust software. Popular machine learning frameworks such as PyTorch and TensorFlow are complex and written in multiple programming languages including C/C++ and Python. We propose a dynamic analysis pipeline for Python projects using the Sydr-Fuzz toolset. Our pipeline includes fuzzing, corpus minimization, crash triaging, and coverage collection. Crash triaging and severity estimation are important steps to ensure that the most critical vulnerabilities are addressed promptly. Furthermore, the proposed pipeline is integrated in GitLab CI. To identify the most vulnerable parts of the machine learning frameworks, we analyze their potential attack surfaces and develop fuzz targets for PyTorch, TensorFlow, and related projects such as h5py. Applying our dynamic analysis pipeline to these targets, we were able to discover 3 new bugs and propose fixes for them.
Related papers
- AdvSecureNet: A Python Toolkit for Adversarial Machine Learning [1.3812010983144798]
AdvSecureNet is a PyTorch based toolkit for adversarial machine learning.
It is the first toolkit that supports both CLI and API interfaces and external YAML configuration files.
The project is available as an open-source project on GitHub.
arXiv Detail & Related papers (2024-09-04T11:47:00Z) - A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning [42.350737545269105]
We show how to run Python's scikit-learn, pytorch and OpenAI gym libraries for building Machine Learning, Deep Learning, and Reinforcement Learning projects easily.
arXiv Detail & Related papers (2024-07-19T23:01:48Z) - KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - pyvene: A Library for Understanding and Improving PyTorch Models via
Interventions [79.72930339711478]
$textbfpyvene$ is an open-source library that supports customizable interventions on a range of different PyTorch modules.
We show how $textbfpyvene$ provides a unified framework for performing interventions on neural models and sharing the intervened upon models with others.
arXiv Detail & Related papers (2024-03-12T16:46:54Z) - Rust for Embedded Systems: Current State, Challenges and Open Problems (Extended Report) [6.414678578343769]
This paper performs the first systematic study to holistically understand the current state and challenges of using RUST for embedded systems.
We collected a dataset of 2,836 RUST embedded software spanning various categories and 5 Static Application Security Testing ( SAST) tools.
We found that existing RUST software support is inadequate, SAST tools cannot handle certain features of RUST embedded software, resulting in failures, and the prevalence of advanced types in existing RUST software makes it challenging to engineer interoperable code.
arXiv Detail & Related papers (2023-11-08T23:59:32Z) - PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time
Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series.
It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - Transactional Python for Durable Machine Learning: Vision, Challenges,
and Feasibility [5.669983975369642]
Python applications may lose important data, such as trained models and extracted features, due to machine failures or human errors.
This paper presents our vision of transactional Python that provides DART without any code modifications to user programs or the Python kernel.
Our evaluation of a proof-of-concept implementation with public PyTorch and scikit-learn applications shows that DART can be offered with overheads ranging 1.5%--15.6%.
arXiv Detail & Related papers (2023-05-15T16:27:09Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - VELVET: a noVel Ensemble Learning approach to automatically locate
VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code.
Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph.
VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z) - SafePILCO: a software tool for safe and data-efficient policy synthesis [67.17251247987187]
SafePILCO is a software tool for safe and data-efficient policy search with reinforcement learning.
It extends the known PILCO algorithm, originally written in Python, to support safe learning.
arXiv Detail & Related papers (2020-08-07T17:17:30Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.