Assessing the Quality of Computational Notebooks for a Frictionless
Transition from Exploration to Production
- URL: http://arxiv.org/abs/2205.11941v1
- Date: Tue, 24 May 2022 10:13:38 GMT
- Title: Assessing the Quality of Computational Notebooks for a Frictionless
Transition from Exploration to Production
- Authors: Luigi Quaranta
- Abstract summary: Data scientists must transition from the explorative phase of Machine Learning projects to their production phase.
To narrow the gap between these two phases, tools and practices adopted by data scientists might be improved by incorporating consolidated software engineering solutions.
In my research project, I study the best practices for collaboration with computational notebooks and propose proof-of-concept tools to foster guidelines compliance.
- Score: 1.332560004325655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The massive trend of integrating data-driven AI capabilities into traditional
software systems is rising new intriguing challenges. One of such challenges is
achieving a smooth transition from the explorative phase of Machine Learning
projects - in which data scientists build prototypical models in the lab - to
their production phase - in which software engineers translate prototypes into
production-ready AI components. To narrow down the gap between these two
phases, tools and practices adopted by data scientists might be improved by
incorporating consolidated software engineering solutions. In particular,
computational notebooks have a prominent role in determining the quality of
data science prototypes. In my research project, I address this challenge by
studying the best practices for collaboration with computational notebooks and
proposing proof-of-concept tools to foster guidelines compliance.
Related papers
- On the Interaction between Software Engineers and Data Scientists when
building Machine Learning-Enabled Systems [1.2184324428571227]
Machine Learning (ML) components have been increasingly integrated into the core systems of organizations.
One of the key challenges is the effective interaction between actors with different backgrounds who need to work closely together.
This paper presents an exploratory case study to understand the current interaction and collaboration dynamics between these roles in ML projects.
arXiv Detail & Related papers (2024-02-08T00:27:56Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Towards Productizing AI/ML Models: An Industry Perspective from Data
Scientists [10.27276267081559]
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers.
In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners.
arXiv Detail & Related papers (2021-03-18T22:25:44Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Enabling collaborative data science development with the Ballet
framework [9.424574945499844]
We present a novel conceptual framework and ML programming model to address challenges to scaling data science collaborations.
We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science.
arXiv Detail & Related papers (2020-12-14T18:51:23Z) - Automatic Feasibility Study via Data Quality Analysis for ML: A
Case-Study on Label Noise [21.491392581672198]
We present Snoopy, with the goal of supporting data scientists and machine learning engineers performing a systematic and theoretically founded feasibility study.
We approach this problem by estimating the irreducible error of the underlying task, also known as the Bayes error rate (BER)
We demonstrate in end-to-end experiments how users are able to save substantial labeling time and monetary efforts.
arXiv Detail & Related papers (2020-10-16T14:21:19Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z) - Convergence of Artificial Intelligence and High Performance Computing on
NSF-supported Cyberinfrastructure [3.4291439418246177]
Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology.
As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single- GPU solutions for training, validation, and testing are no longer sufficient.
This realization has been driving the confluence of AI and high performance computing to reduce time-to-insight.
arXiv Detail & Related papers (2020-03-18T18:00:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.