Assessing the Quality of Computational Notebooks for a Frictionless
Transition from Exploration to Production
- URL: http://arxiv.org/abs/2205.11941v1
- Date: Tue, 24 May 2022 10:13:38 GMT
- Title: Assessing the Quality of Computational Notebooks for a Frictionless
Transition from Exploration to Production
- Authors: Luigi Quaranta
- Abstract summary: Data scientists must transition from the explorative phase of Machine Learning projects to their production phase.
To narrow the gap between these two phases, tools and practices adopted by data scientists might be improved by incorporating consolidated software engineering solutions.
In my research project, I study the best practices for collaboration with computational notebooks and propose proof-of-concept tools to foster guidelines compliance.
- Score: 1.332560004325655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The massive trend of integrating data-driven AI capabilities into traditional
software systems is rising new intriguing challenges. One of such challenges is
achieving a smooth transition from the explorative phase of Machine Learning
projects - in which data scientists build prototypical models in the lab - to
their production phase - in which software engineers translate prototypes into
production-ready AI components. To narrow down the gap between these two
phases, tools and practices adopted by data scientists might be improved by
incorporating consolidated software engineering solutions. In particular,
computational notebooks have a prominent role in determining the quality of
data science prototypes. In my research project, I address this challenge by
studying the best practices for collaboration with computational notebooks and
proposing proof-of-concept tools to foster guidelines compliance.
Related papers
- Data Publishing in Mechanics and Dynamics: Challenges, Guidelines, and Examples from Engineering Design [4.065325208853021]
This article analyzes the value and challenges of data publishing in mechanics and dynamics.
It shows that the latter raise also challenges and considerations not typical in fields where data-driven methods have been booming originally.
arXiv Detail & Related papers (2024-10-07T18:26:05Z) - Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting [36.31269406067809]
We argue that data-centric AI is essential for training AI models, particularly for transformer-based TSF models efficiently.
We review the previous research works from a data-centric AI perspective and we intend to lay the foundation work for the future development of transformer-based architecture and data-centric AI.
arXiv Detail & Related papers (2024-07-29T08:27:21Z) - On the Interaction between Software Engineers and Data Scientists when
building Machine Learning-Enabled Systems [1.2184324428571227]
Machine Learning (ML) components have been increasingly integrated into the core systems of organizations.
One of the key challenges is the effective interaction between actors with different backgrounds who need to work closely together.
This paper presents an exploratory case study to understand the current interaction and collaboration dynamics between these roles in ML projects.
arXiv Detail & Related papers (2024-02-08T00:27:56Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Towards Productizing AI/ML Models: An Industry Perspective from Data
Scientists [10.27276267081559]
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers.
In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners.
arXiv Detail & Related papers (2021-03-18T22:25:44Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z) - Enabling collaborative data science development with the Ballet
framework [9.424574945499844]
We present a novel conceptual framework and ML programming model to address challenges to scaling data science collaborations.
We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science.
arXiv Detail & Related papers (2020-12-14T18:51:23Z) - Integrated Benchmarking and Design for Reproducible and Accessible
Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking.
One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible.
We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z) - Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results.
We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.