Related papers: A Survey on Machine Learning Techniques for Source Code Analysis

A Survey on Machine Learning Techniques for Source Code Analysis

URL: http://arxiv.org/abs/2110.09610v1
Date: Mon, 18 Oct 2021 20:13:38 GMT
Title: A Survey on Machine Learning Techniques for Source Code Analysis
Authors: Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari, Federica Sarro
Abstract summary: We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021.
Score: 14.129976741300029
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Context: The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis such as testing and vulnerabilities detection. A large number of studies poses challenges to the community to understand the current landscape. Objective: We aim to summarize the current knowledge in the area of applied machine learning for source code analysis. Method: We investigate studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021. We summarize our observations and findings with the help of the identified studies. Results: Our findings suggest that the usage of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task, and summarize the employed machine learning techniques. Additionally, we collate a comprehensive list of available datasets and tools useable in this context. Finally, we summarize the perceived challenges in this area that include availability of standard datasets, reproducibility and replicability, and hardware resources.

Related papers

Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis [0.0]
This study proposes a methodology extracting tasks, machine learning methods, and dataset names from scientific papers. The proposed method's expression extraction performance, when using Llama3, achieves an F-score exceeding 0.8 across various categories. Benchmarking results on financial domain papers have demonstrated the effectiveness of this method.
arXiv Detail & Related papers (2024-08-22T03:10:52Z)
DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour [6.716560115378451]
We introduce a modular, flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis. Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency.
arXiv Detail & Related papers (2024-07-18T11:28:52Z)
Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data. One straightforward solution is to integrate statistical analysis and machine learning. Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z)
Artificial intelligence to automate the systematic review of scientific literature [0.0]
We present a survey of AI techniques proposed in the last 15 years to help researchers conduct systematic analyses of scientific literature. We describe the tasks currently supported, the types of algorithms applied, and available tools proposed in 34 primary studies.
arXiv Detail & Related papers (2024-01-13T19:12:49Z)
Exploring Automated Code Evaluation Systems and Resources for Code Analysis: A Comprehensive Survey [1.024113475677323]
This study explores the application areas of automated code evaluation systems (AESs) and their resources. AESs are categorized into programming contests, programming learning and education, recruitment, online compilers, and additional modules. We briefly discuss the Aizu Online Judge platform as a real example of an AES from the perspectives of system design ( hardware and software), operation (competition and education), and research.
arXiv Detail & Related papers (2023-07-08T16:31:38Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning. We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)
Curriculum Learning: A Survey [65.31516318260759]
Curriculum learning strategies have been successfully employed in all areas of machine learning. We construct a taxonomy of curriculum learning approaches by hand, considering various classification criteria. We build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm.
arXiv Detail & Related papers (2021-01-25T20:08:32Z)
A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research [22.21817722054742]
An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept of Deep Learning (DL) This paper presents a systematic literature review of research at the intersection of SE & DL. We center our analysis around the components of learning, a set of principles that govern the application of machine learning techniques to a given problem domain.
arXiv Detail & Related papers (2020-09-14T15:28:28Z)
Machine Learning for Software Engineering: A Systematic Mapping [73.30245214374027]
The software development industry is rapidly adopting machine learning for transitioning modern day software systems towards highly intelligent and self-learning systems. No comprehensive study exists that explores the current state-of-the-art on the adoption of machine learning across software engineering life cycle stages. This study introduces a machine learning for software engineering (MLSE) taxonomy classifying the state-of-the-art machine learning techniques according to their applicability to various software engineering life cycle stages.
arXiv Detail & Related papers (2020-05-27T11:56:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.