A Large-Scale Study of Model Integration in ML-Enabled Software Systems
- URL: http://arxiv.org/abs/2408.06226v2
- Date: Mon, 24 Feb 2025 15:02:27 GMT
- Title: A Large-Scale Study of Model Integration in ML-Enabled Software Systems
- Authors: Yorick Sens, Henriette Knopp, Sven Peldszus, Thorsten Berger,
- Abstract summary: Machine learning (ML) and its integration into software systems has drastically changed development practices.<n>We present a large-scale study of 2,928 open-source ML-enabled software systems.
- Score: 4.776073133338119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of machine learning (ML) and its integration into software systems has drastically changed development practices. While software engineering traditionally focused on manually created code artifacts with dedicated processes and architectures, ML-enabled systems require additional data-science methods and tools to create ML artifacts -- especially ML models and training data. However, integrating models into systems, and managing the many different artifacts involved, is far from trivial. ML-enabled systems can easily have multiple ML models that interact with each other and with traditional code in intricate ways. Unfortunately, while challenges and practices of building ML-enabled systems have been studied, little is known about the characteristics of real-world ML-enabled systems beyond isolated examples. Improving engineering processes and architectures for ML-enabled systems requires improving the empirical understanding of these systems. We present a large-scale study of 2,928 open-source ML-enabled software systems. We classified and analyzed them to determine system characteristics, model and code reuse practices, and architectural aspects of integrating ML models. Our findings show that these systems still mainly consist of traditional source code, and that ML model reuse through code duplication or pre-trained models is common. We also identified different ML integration patterns and related implementation practices. We hope that our results help improve practices for integrating ML models, bringing data science and software engineering closer together.
Related papers
- A quantitative framework for evaluating architectural patterns in ML systems [49.1574468325115]
This study proposes a framework for quantitative assessment of architectural patterns in ML systems.
We focus on scalability and performance metrics for cost-effective CPU-based inference.
arXiv Detail & Related papers (2025-01-20T15:30:09Z) - LLAVADI: What Matters For Multimodal Large Language Models Distillation [77.73964744238519]
In this work, we do not propose a new efficient model structure or train small-scale MLLMs from scratch.
Our studies involve training strategies, model choices, and distillation algorithms in the knowledge distillation process.
By evaluating different benchmarks and proper strategy, even a 2.7B small-scale model can perform on par with larger models with 7B or 13B parameters.
arXiv Detail & Related papers (2024-07-28T06:10:47Z) - ML-On-Rails: Safeguarding Machine Learning Models in Software Systems A
Case Study [4.087995998278127]
We introduce ML-On-Rails, a protocol designed to safeguard machine learning models.
ML-On-Rails establishes a well-defined endpoint interface for different ML tasks, and clear communication between ML providers and ML consumers.
We evaluate the protocol through a real-world case study of the MoveReminder application.
arXiv Detail & Related papers (2024-01-12T11:27:15Z) - An Exploratory Study of V-Model in Building ML-Enabled Software: A Systems Engineering Perspective [0.7252027234425334]
Machine learning (ML) components are being added to more and more critical and impactful software systems.
This research investigates the use of V-Model in addressing the interdisciplinary collaboration challenges when building ML-enabled systems.
arXiv Detail & Related papers (2023-08-10T06:53:32Z) - Machine Learning-Enabled Software and System Architecture Frameworks [48.87872564630711]
The stakeholders with data science and Machine Learning related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks.
We surveyed 61 subject matter experts from over 25 organizations in 10 countries.
arXiv Detail & Related papers (2023-08-09T21:54:34Z) - Understanding the Complexity and Its Impact on Testing in ML-Enabled
Systems [8.630445165405606]
We study Rasa 3.0, an industrial dialogue system that has been widely adopted by various companies around the world.
Our goal is to characterize the complexity of such a largescale ML-enabled system and to understand the impact of the complexity on testing.
Our study reveals practical implications for software engineering for ML-enabled systems.
arXiv Detail & Related papers (2023-01-10T08:13:24Z) - MDE for Machine Learning-Enabled Software Systems: A Case Study and
Comparison of MontiAnna & ML-Quadrat [5.839906946900443]
We propose to adopt the MDE paradigm for the development of Machine Learning-enabled software systems with a focus on the Internet of Things (IoT) domain.
We illustrate how two state-of-the-art open-source modeling tools, namely MontiAnna and ML-Quadrat can be used for this purpose as demonstrated through a case study.
arXiv Detail & Related papers (2022-09-15T13:21:16Z) - Enabling Automated Machine Learning for Model-Driven AI Engineering [60.09869520679979]
We propose a novel approach to enable Model-Driven Software Engineering and Model-Driven AI Engineering.
In particular, we support Automated ML, thus assisting software engineers without deep AI knowledge in developing AI-intensive systems.
arXiv Detail & Related papers (2022-03-06T10:12:56Z) - Panoramic Learning with A Standardized Machine Learning Formalism [116.34627789412102]
This paper presents a standardized equation of the learning objective, that offers a unifying understanding of diverse ML algorithms.
It also provides guidance for mechanic design of new ML solutions, and serves as a promising vehicle towards panoramic learning with all experiences.
arXiv Detail & Related papers (2021-08-17T17:44:38Z) - Declarative Machine Learning Systems [7.5717114708721045]
Machine learning (ML) has moved from a academic endeavor to a pervasive technology adopted in almost every aspect of computing.
Recent successes in applying ML in natural sciences revealed that ML can be used to tackle some of the hardest real-world problems humanity faces today.
We believe the next wave of ML systems will allow a larger amount of people, potentially without coding skills, to perform the same tasks.
arXiv Detail & Related papers (2021-07-16T23:57:57Z) - A Survey of Machine Learning for Computer Architecture and Systems [18.620218353713476]
It has been a long time that computer architecture and systems are optimized to enable efficient execution of machine learning (ML) algorithms or models.
Now, it is time to reconsider the relationship between ML and systems, and let ML transform the way that computer architecture and systems are designed.
arXiv Detail & Related papers (2021-02-16T04:09:57Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv Detail & Related papers (2020-07-17T01:52:34Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.