Related papers: Investigating the Impact of SOLID Design Principles on Machine Learning Code Understanding

Investigating the Impact of SOLID Design Principles on Machine Learning Code Understanding

URL: http://arxiv.org/abs/2402.05337v1
Date: Thu, 8 Feb 2024 00:44:45 GMT
Title: Investigating the Impact of SOLID Design Principles on Machine Learning Code Understanding
Authors: Raphael Cabral, Marcos Kalinowski, Maria Teresa Baldassarre, Hugo Villamizar, Tatiana Escovedo, H\'elio Lopes
Abstract summary: We investigated the impact of the SOLID design principles on Machine Learning code understanding. We restructured real industrial ML code that did not use SOLID principles. Results provided statistically significant evidence that the adoption of the SOLID design principles can improve code understanding.
Score: 2.5788518098820337
License: http://creativecommons.org/licenses/by/4.0/
Abstract: [Context] Applying design principles has long been acknowledged as beneficial for understanding and maintainability in traditional software projects. These benefits may similarly hold for Machine Learning (ML) projects, which involve iterative experimentation with data, models, and algorithms. However, ML components are often developed by data scientists with diverse educational backgrounds, potentially resulting in code that doesn't adhere to software design best practices. [Goal] In order to better understand this phenomenon, we investigated the impact of the SOLID design principles on ML code understanding. [Method] We conducted a controlled experiment with three independent trials involving 100 data scientists. We restructured real industrial ML code that did not use SOLID principles. Within each trial, one group was presented with the original ML code, while the other was presented with ML code incorporating SOLID principles. Participants of both groups were asked to analyze the code and fill out a questionnaire that included both open-ended and closed-ended questions on their understanding. [Results] The study results provide statistically significant evidence that the adoption of the SOLID design principles can improve code understanding within the realm of ML projects. [Conclusion] We put forward that software engineering design principles should be spread within the data science community and considered for enhancing the maintainability of ML code.

Related papers

Evaluating the Application of SOLID Principles in Modern AI Framework Architectures [0.0]
This research evaluates the extent to which modern AI frameworks, specifically scikit-learn, adhere to the SOLID design principles. I examined each frameworks documentation, source code, and architectural components to evaluate their adherence to these principles.
arXiv Detail & Related papers (2025-03-18T00:37:23Z)
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [70.72097493954067]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited. We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z)
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions. We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types. We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z)
A Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE Actions [13.58143103712]
GitHub Copilot is a large language model (LLM)-powered code generation tool. This paper investigates how developers validate and repair code generated by Copilot. Being aware of the code's provenance led to improved performance, increased search efforts, more frequent Copilot usage, and higher cognitive workload.
arXiv Detail & Related papers (2024-05-25T06:20:01Z)
MachineLearnAthon: An Action-Oriented Machine Learning Didactic Concept [34.6229719907685]
This paper introduces the MachineLearnAthon format, an innovative didactic concept designed to be inclusive for students of different disciplines. At the heart of the concept lie ML challenges, which make use of industrial data sets to solve real-world problems. These cover the entire ML pipeline, promoting data literacy and practical skills, from data preparation, through deployment, to evaluation.
arXiv Detail & Related papers (2024-01-29T16:50:32Z)
When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z)
Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning [34.006227676170504]
This study investigates the feasibility of utilizing large language models (LLMs) to generate comments that can fulfill developers' diverse intents. Experiments on two large-scale datasets demonstrate the rationale of our insights.
arXiv Detail & Related papers (2023-04-22T12:26:24Z)
Empowering the trustworthiness of ML-based critical systems through engineering activities [0.0]
This paper reviews the entire engineering process of trustworthy Machine Learning (ML) algorithms. We start from the fundamental principles of ML and describe the core elements conditioning its trust, particularly through its design.
arXiv Detail & Related papers (2022-09-30T12:42:18Z)
Panoramic Learning with A Standardized Machine Learning Formalism [116.34627789412102]
This paper presents a standardized equation of the learning objective, that offers a unifying understanding of diverse ML algorithms. It also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
arXiv Detail & Related papers (2021-08-17T17:44:38Z)
Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap. We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
Machine Learning Force Fields [54.48599172620472]
Machine Learning (ML) has enabled numerous advances in computational chemistry. One of the most promising applications is the construction of ML-based force fields (FFs) This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them.
arXiv Detail & Related papers (2020-10-14T13:14:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.