A Survey on Machine Learning Techniques for Source Code Analysis
- URL: http://arxiv.org/abs/2110.09610v1
- Date: Mon, 18 Oct 2021 20:13:38 GMT
- Title: A Survey on Machine Learning Techniques for Source Code Analysis
- Authors: Tushar Sharma, Maria Kechagia, Stefanos Georgiou, Rohit Tiwari,
Federica Sarro
- Abstract summary: We aim to summarize the current knowledge in the area of applied machine learning for source code analysis.
To do so, we carried out an extensive literature search and identified 364 primary studies published between 2002 and 2021.
- Score: 14.129976741300029
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Context: The advancements in machine learning techniques have encouraged
researchers to apply these techniques to a myriad of software engineering tasks
that use source code analysis such as testing and vulnerabilities detection. A
large number of studies poses challenges to the community to understand the
current landscape. Objective: We aim to summarize the current knowledge in the
area of applied machine learning for source code analysis. Method: We
investigate studies belonging to twelve categories of software engineering
tasks and corresponding machine learning techniques, tools, and datasets that
have been applied to solve them. To do so, we carried out an extensive
literature search and identified 364 primary studies published between 2002 and
2021. We summarize our observations and findings with the help of the
identified studies. Results: Our findings suggest that the usage of machine
learning techniques for source code analysis tasks is consistently increasing.
We synthesize commonly used steps and the overall workflow for each task, and
summarize the employed machine learning techniques. Additionally, we collate a
comprehensive list of available datasets and tools useable in this context.
Finally, we summarize the perceived challenges in this area that include
availability of standard datasets, reproducibility and replicability, and
hardware resources.
Related papers
- Extraction of Research Objectives, Machine Learning Model Names, and Dataset Names from Academic Papers and Analysis of Their Interrelationships Using LLM and Network Analysis [0.0]
This study proposes a methodology extracting tasks, machine learning methods, and dataset names from scientific papers.
The proposed method's expression extraction performance, when using Llama3, achieves an F-score exceeding 0.8 across various categories.
Benchmarking results on financial domain papers have demonstrated the effectiveness of this method.
arXiv Detail & Related papers (2024-08-22T03:10:52Z) - DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human Behaviour [6.716560115378451]
We introduce a modular, flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis.
Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency.
arXiv Detail & Related papers (2024-07-18T11:28:52Z) - Ontology Embedding: A Survey of Methods, Applications and Resources [54.3453925775069]
Ontologies are widely used for representing domain knowledge and meta data.
One straightforward solution is to integrate statistical analysis and machine learning.
Numerous papers have been published on embedding, but a lack of systematic reviews hinders researchers from gaining a comprehensive understanding of this field.
arXiv Detail & Related papers (2024-06-16T14:49:19Z) - Artificial intelligence to automate the systematic review of scientific
literature [0.0]
We present a survey of AI techniques proposed in the last 15 years to help researchers conduct systematic analyses of scientific literature.
We describe the tasks currently supported, the types of algorithms applied, and available tools proposed in 34 primary studies.
arXiv Detail & Related papers (2024-01-13T19:12:49Z) - Exploring Automated Code Evaluation Systems and Resources for Code
Analysis: A Comprehensive Survey [1.024113475677323]
This study explores the application areas of automated code evaluation systems (AESs) and their resources.
AESs are categorized into programming contests, programming learning and education, recruitment, online compilers, and additional modules.
We briefly discuss the Aizu Online Judge platform as a real example of an AES from the perspectives of system design ( hardware and software), operation (competition and education), and research.
arXiv Detail & Related papers (2023-07-08T16:31:38Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or
Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms.
Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications.
By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z) - Curriculum Learning: A Survey [65.31516318260759]
Curriculum learning strategies have been successfully employed in all areas of machine learning.
We construct a taxonomy of curriculum learning approaches by hand, considering various classification criteria.
We build a hierarchical tree of curriculum learning methods using an agglomerative clustering algorithm.
arXiv Detail & Related papers (2021-01-25T20:08:32Z) - A Systematic Literature Review on the Use of Deep Learning in Software
Engineering Research [22.21817722054742]
An increasingly popular set of techniques adopted by software engineering (SE) researchers to automate development tasks are those rooted in the concept of Deep Learning (DL)
This paper presents a systematic literature review of research at the intersection of SE & DL.
We center our analysis around the components of learning, a set of principles that govern the application of machine learning techniques to a given problem domain.
arXiv Detail & Related papers (2020-09-14T15:28:28Z) - Machine Learning for Software Engineering: A Systematic Mapping [73.30245214374027]
The software development industry is rapidly adopting machine learning for transitioning modern day software systems towards highly intelligent and self-learning systems.
No comprehensive study exists that explores the current state-of-the-art on the adoption of machine learning across software engineering life cycle stages.
This study introduces a machine learning for software engineering (MLSE) taxonomy classifying the state-of-the-art machine learning techniques according to their applicability to various software engineering life cycle stages.
arXiv Detail & Related papers (2020-05-27T11:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.