Related papers: Assessing the Quality of Computational Notebooks for a Frictionless Transition from Exploration to Production

Assessing the Quality of Computational Notebooks for a Frictionless Transition from Exploration to Production

URL: http://arxiv.org/abs/2205.11941v1
Date: Tue, 24 May 2022 10:13:38 GMT
Title: Assessing the Quality of Computational Notebooks for a Frictionless Transition from Exploration to Production
Authors: Luigi Quaranta
Abstract summary: Data scientists must transition from the explorative phase of Machine Learning projects to their production phase. To narrow the gap between these two phases, tools and practices adopted by data scientists might be improved by incorporating consolidated software engineering solutions. In my research project, I study the best practices for collaboration with computational notebooks and propose proof-of-concept tools to foster guidelines compliance.
Score: 1.332560004325655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The massive trend of integrating data-driven AI capabilities into traditional software systems is rising new intriguing challenges. One of such challenges is achieving a smooth transition from the explorative phase of Machine Learning projects - in which data scientists build prototypical models in the lab - to their production phase - in which software engineers translate prototypes into production-ready AI components. To narrow down the gap between these two phases, tools and practices adopted by data scientists might be improved by incorporating consolidated software engineering solutions. In particular, computational notebooks have a prominent role in determining the quality of data science prototypes. In my research project, I address this challenge by studying the best practices for collaboration with computational notebooks and proposing proof-of-concept tools to foster guidelines compliance.

Related papers

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis [55.390060529534644]
We propose OS-Genesis, a novel data synthesis pipeline for Graphical User Interface (GUI) agents. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks.
arXiv Detail & Related papers (2024-12-27T16:21:58Z)
Data Publishing in Mechanics and Dynamics: Challenges, Guidelines, and Examples from Engineering Design [4.065325208853021]
This article analyzes the value and challenges of data publishing in mechanics and dynamics. It shows that the latter raise also challenges and considerations not typical in fields where data-driven methods have been booming originally.
arXiv Detail & Related papers (2024-10-07T18:26:05Z)
Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z)
Survey and Taxonomy: The Role of Data-Centric AI in Transformer-Based Time Series Forecasting [36.31269406067809]
We argue that data-centric AI is essential for training AI models, particularly for transformer-based TSF models efficiently. We review the previous research works from a data-centric AI perspective and we intend to lay the foundation work for the future development of transformer-based architecture and data-centric AI.
arXiv Detail & Related papers (2024-07-29T08:27:21Z)
On the Interaction between Software Engineers and Data Scientists when building Machine Learning-Enabled Systems [1.2184324428571227]
Machine Learning (ML) components have been increasingly integrated into the core systems of organizations. One of the key challenges is the effective interaction between actors with different backgrounds who need to work closely together. This paper presents an exploratory case study to understand the current interaction and collaboration dynamics between these roles in ML projects.
arXiv Detail & Related papers (2024-02-08T00:27:56Z)
Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models. The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
Towards Productizing AI/ML Models: An Industry Perspective from Data Scientists [10.27276267081559]
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers. In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners.
arXiv Detail & Related papers (2021-03-18T22:25:44Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
Enabling collaborative data science development with the Ballet framework [9.424574945499844]
We present a novel conceptual framework and ML programming model to address challenges to scaling data science collaborations. We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science.
arXiv Detail & Related papers (2020-12-14T18:51:23Z)
Integrated Benchmarking and Design for Reproducible and Accessible Evaluation of Robotic Agents [61.36681529571202]
We describe a new concept for reproducible robotics research that integrates development and benchmarking. One of the central components of this setup is the Duckietown Autolab, a standardized setup that is itself relatively low-cost and reproducible. We validate the system by analyzing the repeatability of experiments conducted using the infrastructure and show that there is low variance across different robot hardware and across different remote labs.
arXiv Detail & Related papers (2020-09-09T15:31:29Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.