Related papers: Modelling Concurrency Bugs Using Machine Learning

Modelling Concurrency Bugs Using Machine Learning

URL: http://arxiv.org/abs/2305.05531v1
Date: Mon, 8 May 2023 17:30:24 GMT
Title: Modelling Concurrency Bugs Using Machine Learning
Authors: Teodor Rares Begu
Abstract summary: This project aims to compare both common and recent machine learning approaches. We define a synthetic dataset that we generate with the scope of simulating real-life (concurrent) programs. We formulate hypotheses about fundamental limits of various machine learning model types.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Artificial Intelligence has gained a lot of traction in the recent years, with machine learning notably starting to see more applications across a varied range of fields. One specific machine learning application that is of interest to us is that of software safety and security, especially in the context of parallel programs. The issue of being able to detect concurrency bugs automatically has intrigued programmers for a long time, as the added layer of complexity makes concurrent programs more prone to failure. The development of such automatic detection tools provides considerable benefits to programmers in terms of saving time while debugging, as well as reducing the number of unexpected bugs. We believe machine learning may help achieve this goal by providing additional advantages over current approaches, in terms of both overall tool accuracy as well as programming language flexibility. However, due to the presence of numerous challenges specific to the machine learning approach (correctly labelling a sufficiently large dataset, finding the best model types/architectures and so forth), we have to approach each issue of developing such a tool separately. Therefore, the focus of this project is on comparing both common and recent machine learning approaches. We abstract away the complexity of procuring a labelled dataset of concurrent programs under the form of a synthetic dataset that we define and generate with the scope of simulating real-life (concurrent) programs. We formulate hypotheses about fundamental limits of various machine learning model types which we then validate by running extensive tests on our synthetic dataset. We hope that our findings provide more insight in the advantages and disadvantages of various model types when modelling programs using machine learning, as well as any other related field (e.g. NLP).

Related papers

Multimodal Learning for Just-In-Time Software Defect Prediction in Autonomous Driving Systems [0.0]
This paper proposes a novel approach for just-in-time software defect prediction (JIT-SDP) in autonomous driving software systems using multimodal learning. Our findings highlight the potential of multimodal learning to enhance the reliability and safety of autonomous driving software.
arXiv Detail & Related papers (2025-02-28T07:45:10Z)
Learning Randomized Reductions and Program Properties [12.027016519515477]
Bitween is a method and tool for automated learning of randomized (self)-reductions and program properties in numerical programs. We establish a theoretical framework for learning these reductions and introduce RSR-Bench, a benchmark suite for evaluating Bitween's capabilities on scientific and machine learning functions.
arXiv Detail & Related papers (2024-12-24T03:42:53Z)
HPC-Coder: Modeling Parallel Programs using Large Language Models [2.3101915391170573]
We show how large language models can be applied to tasks specific to high performance and scientific codes. We introduce a new dataset of HPC and scientific codes and use it to fine-tune several pre-trained models. In our experiments, we show that this model can auto-complete HPC functions where generic models cannot.
arXiv Detail & Related papers (2023-06-29T19:44:55Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Machine Learning for QoS Prediction in Vehicular Communication: Challenges and Solution Approaches [46.52224306624461]
We consider maximum throughput prediction enhancing, for example, streaming or high-definition mapping applications. We highlight how confidence can be built on machine learning technologies by better understanding the underlying characteristics of the collected data. We use explainable AI to show that machine learning can learn underlying principles of wireless networks without being explicitly programmed.
arXiv Detail & Related papers (2023-02-23T12:29:20Z)
HyperPUT: Generating Synthetic Faulty Programs to Challenge Bug-Finding Tools [3.8520163964103835]
We propose a complementary approach that automatically generates programs with seeded bugs. Our technique, called HyperPUT, builds C programs from a "seed" bug by incrementally applying program transformations.
arXiv Detail & Related papers (2022-09-14T13:09:41Z)
Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models. The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z)
Design Automation for Fast, Lightweight, and Effective Deep Learning Models: A Survey [53.258091735278875]
This survey covers studies of design automation techniques for deep learning models targeting edge computing. It offers an overview and comparison of key metrics that are used commonly to quantify the proficiency of models in terms of effectiveness, lightness, and computational costs. The survey proceeds to cover three categories of the state-of-the-art of deep model design automation techniques.
arXiv Detail & Related papers (2022-08-22T12:12:43Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Frugal Machine Learning [7.460473725109103]
This paper investigates frugal learning, aimed to build the most accurate possible models using the least amount of resources. The most promising algorithms are then assessed in a real-world scenario by implementing them in a smartwatch and letting them learn activity recognition models on the watch itself.
arXiv Detail & Related papers (2021-11-05T21:27:55Z)
Importance measures derived from random forests: characterisation and extension [0.2741266294612776]
This thesis aims at improving the interpretability of models built by a specific family of machine learning algorithms. Several mechanisms have been proposed to interpret these models and we aim along this thesis to improve their understanding.
arXiv Detail & Related papers (2021-06-17T13:23:57Z)
Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes. Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
Knowledge as Invariance -- History and Perspectives of Knowledge-augmented Machine Learning [69.99522650448213]
Research in machine learning is at a turning point. Research interests are shifting away from increasing the performance of highly parameterized models to exceedingly specific tasks. This white paper provides an introduction and discussion of this emerging field in machine learning research.
arXiv Detail & Related papers (2020-12-21T15:07:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.