Exploring the potential of flow-based programming for machine learning
deployment in comparison with service-oriented architectures
- URL: http://arxiv.org/abs/2108.04105v1
- Date: Mon, 9 Aug 2021 15:06:02 GMT
- Title: Exploring the potential of flow-based programming for machine learning
deployment in comparison with service-oriented architectures
- Authors: Andrei Paleyes, Christian Cabrera, Neil D. Lawrence
- Abstract summary: We argue that part of the reason is infrastructure that was not designed for activities around data collection and analysis.
We propose to consider flow-based programming with data streams as an alternative to commonly used service-oriented architectures for building software applications.
- Score: 8.677012233188968
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite huge successes reported by the field of machine learning, such as
speech assistants or self-driving cars, businesses still observe very high
failure rate when it comes to deployment of ML in production. We argue that
part of the reason is infrastructure that was not designed for activities
around data collection and analysis. We propose to consider flow-based
programming with data streams as an alternative to commonly used
service-oriented architectures for building software applications. To compare
flow-based programming with the widespread service-oriented approach, we
develop a data processing application, and formulate two subsequent ML-related
tasks that constitute a complete cycle of ML deployment while allowing us to
assess characteristics of each programming paradigm in the ML context.
Employing both code metrics and empirical observations, we show that when it
comes to ML deployment each paradigm has certain advantages and drawbacks. Our
main conclusion is that while FBP shows great potential for providing
infrastructural benefits for deployment of machine learning, it requires a lot
of boilerplate code to define and manipulate the dataflow graph. We believe
that with better developer tools in place this problem can be alleviated,
establishing FBP as a strong alternative to currently prevalent SOA-driven
software design approach. Additionally, we provide an insight into the trend of
prioritising model development over data quality management.
Related papers
- Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.
We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.
We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - Towards an MLOps Architecture for XAI in Industrial Applications [2.0457031151514977]
Machine learning (ML) has become a popular tool in the industrial sector as it helps to improve operations, increase efficiency, and reduce costs.
One of the remaining Machine Learning Operations (MLOps) challenges is the need for explanations.
We developed a novel MLOps software architecture to address the challenge of integrating explanations and feedback capabilities into the ML development and deployment processes.
arXiv Detail & Related papers (2023-09-22T09:56:25Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - MLOps: A Step Forward to Enterprise Machine Learning [0.0]
This research presents a detailed review of MLOps, its benefits, difficulties, evolutions, and important underlying technologies.
The MLOps workflow is explained in detail along with the various tools necessary for both model and data exploration and deployment.
This article also puts light on the end-to-end production of ML projects using various maturity levels of automated pipelines.
arXiv Detail & Related papers (2023-05-27T20:44:14Z) - Benchmarking Automated Machine Learning Methods for Price Forecasting
Applications [58.720142291102135]
We show the possibility of substituting manually created ML pipelines with automated machine learning (AutoML) solutions.
Based on the CRISP-DM process, we split the manual ML pipeline into a machine learning and non-machine learning part.
We show in a case study for the industrial use case of price forecasting, that domain knowledge combined with AutoML can weaken the dependence on ML experts.
arXiv Detail & Related papers (2023-04-28T10:27:38Z) - Reasonable Scale Machine Learning with Open-Source Metaflow [2.637746074346334]
We argue that re-purposing existing tools won't solve the current productivity issues.
We introduce Metaflow, an open-source framework for ML projects explicitly designed to boost the productivity of data practitioners.
arXiv Detail & Related papers (2023-03-21T11:28:09Z) - Dataflow graphs as complete causal graphs [17.15640410609126]
We consider an alternative approach to software design, flow-based programming (FBP)
We show how this connection can be leveraged to improve day-to-day tasks in software projects.
arXiv Detail & Related papers (2023-03-16T17:59:13Z) - An Empirical Evaluation of Flow Based Programming in the Machine
Learning Deployment Context [11.028123436097616]
Data Oriented Architecture (DOA) is an emerging approach that can support data scientists and software developers when addressing challenges.
This paper proposes to consider Flow-Based Programming (FBP) as a paradigm for creating DOA applications.
We empirically evaluate FBP in the context of ML deployment on four applications that represent typical data science projects.
arXiv Detail & Related papers (2022-04-27T09:08:48Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.