Modelling Concurrency Bugs Using Machine Learning
- URL: http://arxiv.org/abs/2305.05531v1
- Date: Mon, 8 May 2023 17:30:24 GMT
- Title: Modelling Concurrency Bugs Using Machine Learning
- Authors: Teodor Rares Begu
- Abstract summary: This project aims to compare both common and recent machine learning approaches.
We define a synthetic dataset that we generate with the scope of simulating real-life (concurrent) programs.
We formulate hypotheses about fundamental limits of various machine learning model types.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Artificial Intelligence has gained a lot of traction in the recent years,
with machine learning notably starting to see more applications across a varied
range of fields. One specific machine learning application that is of interest
to us is that of software safety and security, especially in the context of
parallel programs. The issue of being able to detect concurrency bugs
automatically has intrigued programmers for a long time, as the added layer of
complexity makes concurrent programs more prone to failure. The development of
such automatic detection tools provides considerable benefits to programmers in
terms of saving time while debugging, as well as reducing the number of
unexpected bugs. We believe machine learning may help achieve this goal by
providing additional advantages over current approaches, in terms of both
overall tool accuracy as well as programming language flexibility. However, due
to the presence of numerous challenges specific to the machine learning
approach (correctly labelling a sufficiently large dataset, finding the best
model types/architectures and so forth), we have to approach each issue of
developing such a tool separately. Therefore, the focus of this project is on
comparing both common and recent machine learning approaches. We abstract away
the complexity of procuring a labelled dataset of concurrent programs under the
form of a synthetic dataset that we define and generate with the scope of
simulating real-life (concurrent) programs. We formulate hypotheses about
fundamental limits of various machine learning model types which we then
validate by running extensive tests on our synthetic dataset. We hope that our
findings provide more insight in the advantages and disadvantages of various
model types when modelling programs using machine learning, as well as any
other related field (e.g. NLP).
Related papers
- HPC-Coder: Modeling Parallel Programs using Large Language Models [2.3101915391170573]
We show how large language models can be applied to tasks specific to high performance and scientific codes.
We introduce a new dataset of HPC and scientific codes and use it to fine-tune several pre-trained models.
In our experiments, we show that this model can auto-complete HPC functions where generic models cannot.
arXiv Detail & Related papers (2023-06-29T19:44:55Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Machine Learning for QoS Prediction in Vehicular Communication:
Challenges and Solution Approaches [46.52224306624461]
We consider maximum throughput prediction enhancing, for example, streaming or high-definition mapping applications.
We highlight how confidence can be built on machine learning technologies by better understanding the underlying characteristics of the collected data.
We use explainable AI to show that machine learning can learn underlying principles of wireless networks without being explicitly programmed.
arXiv Detail & Related papers (2023-02-23T12:29:20Z) - HyperPUT: Generating Synthetic Faulty Programs to Challenge Bug-Finding
Tools [3.8520163964103835]
We propose a complementary approach that automatically generates programs with seeded bugs.
Our technique, called HyperPUT, builds C programs from a "seed" bug by incrementally applying program transformations.
arXiv Detail & Related papers (2022-09-14T13:09:41Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Design Automation for Fast, Lightweight, and Effective Deep Learning
Models: A Survey [53.258091735278875]
This survey covers studies of design automation techniques for deep learning models targeting edge computing.
It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs.
The survey proceeds to cover three categories of the state-of-the-art of deep model design automation techniques.
arXiv Detail & Related papers (2022-08-22T12:12:43Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Frugal Machine Learning [7.460473725109103]
This paper investigates frugal learning, aimed to build the most accurate possible models using the least amount of resources.
The most promising algorithms are then assessed in a real-world scenario by implementing them in a smartwatch and letting them learn activity recognition models on the watch itself.
arXiv Detail & Related papers (2021-11-05T21:27:55Z) - Importance measures derived from random forests: characterisation and
extension [0.2741266294612776]
This thesis aims at improving the interpretability of models built by a specific family of machine learning algorithms.
Several mechanisms have been proposed to interpret these models and we aim along this thesis to improve their understanding.
arXiv Detail & Related papers (2021-06-17T13:23:57Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Knowledge as Invariance -- History and Perspectives of
Knowledge-augmented Machine Learning [69.99522650448213]
Research in machine learning is at a turning point.
Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks.
This white paper provides an introduction and discussion of this emerging field in machine learning research.
arXiv Detail & Related papers (2020-12-21T15:07:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.