Operationalizing Machine Learning: An Interview Study
- URL: http://arxiv.org/abs/2209.09125v1
- Date: Fri, 16 Sep 2022 16:59:36 GMT
- Title: Operationalizing Machine Learning: An Interview Study
- Authors: Shreya Shankar, Rolando Garcia, Joseph M. Hellerstein, Aditya G.
Parameswaran
- Abstract summary: We conduct semi-structured interviews with 18 machine learning engineers (MLEs) working across many applications.
Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning.
We summarize common practices for successful ML experimentation, deployment, and sustaining production performance.
- Score: 13.300075655862573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Organizations rely on machine learning engineers (MLEs) to operationalize ML,
i.e., deploy and maintain ML pipelines in production. The process of
operationalizing ML, or MLOps, consists of a continual loop of (i) data
collection and labeling, (ii) experimentation to improve ML performance, (iii)
evaluation throughout a multi-staged deployment process, and (iv) monitoring of
performance drops in production. When considered together, these
responsibilities seem staggering -- how does anyone do MLOps, what are the
unaddressed challenges, and what are the implications for tool builders?
We conducted semi-structured ethnographic interviews with 18 MLEs working
across many applications, including chatbots, autonomous vehicles, and finance.
Our interviews expose three variables that govern success for a production ML
deployment: Velocity, Validation, and Versioning. We summarize common practices
for successful ML experimentation, deployment, and sustaining production
performance. Finally, we discuss interviewees' pain points and anti-patterns,
with implications for tool design.
Related papers
- Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - COLT: Towards Completeness-Oriented Tool Retrieval for Large Language Models [60.733557487886635]
We propose a novel modelagnostic COllaborative Learning-based Tool Retrieval approach, COLT.
COLT captures semantic similarities between user queries and tool descriptions.
It also takes into account the collaborative information of tools.
arXiv Detail & Related papers (2024-05-25T06:41:23Z) - Large Language Models Synergize with Automated Machine Learning [12.364087286739647]
This paper explores a novel form of program synthesis, targeting machine learning (ML) programs, by combining large language models (LLMs) and automated machine learning (autoML)
In experiments, given the textual task description, our method, Text-to-ML, generates the complete and optimized ML program in a fully autonomous process.
arXiv Detail & Related papers (2024-05-06T08:09:46Z) - From Summary to Action: Enhancing Large Language Models for Complex
Tasks with Open World APIs [62.496139001509114]
We introduce a novel tool invocation pipeline designed to control massive real-world APIs.
This pipeline mirrors the human task-solving process, addressing complicated real-life user queries.
Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements.
arXiv Detail & Related papers (2024-02-28T08:42:23Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - TaskBench: Benchmarking Large Language Models for Task Automation [85.3879908356586]
We introduce TaskBench to evaluate the capability of large language models in task automation.
To generate high-quality evaluation datasets, we introduce the concept of Tool Graph.
We also propose TaskEval to evaluate the capability of LLMs from different aspects, including task decomposition, tool invocation, and parameter prediction.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation [96.71370747681078]
We introduce MLAgentBench, a suite of 13 tasks ranging from improving model performance on CIFAR-10 to recent research problems like BabyLM.
For each task, an agent can perform actions like reading/writing files, executing code, and inspecting outputs.
We benchmark agents based on Claude v1.0, Claude v2.1, Claude v3 Opus, GPT-4, GPT-4-turbo, Gemini-Pro, and Mixtral and find that a Claude v3 Opus agent is the best in terms of success rate.
arXiv Detail & Related papers (2023-10-05T04:06:12Z) - Reasonable Scale Machine Learning with Open-Source Metaflow [2.637746074346334]
We argue that re-purposing existing tools won't solve the current productivity issues.
We introduce Metaflow, an open-source framework for ML projects explicitly designed to boost the productivity of data practitioners.
arXiv Detail & Related papers (2023-03-21T11:28:09Z) - Machine Learning for Software Engineering: A Tertiary Study [13.832268599253412]
Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities.
We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009-2022, covering 6,117 primary studies.
The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML.
arXiv Detail & Related papers (2022-11-17T09:19:53Z) - Machine Learning Operations (MLOps): Overview, Definition, and
Architecture [0.0]
The paradigm of Machine Learning Operations (MLOps) addresses this issue.
MLOps is still a vague term and its consequences for researchers and professionals are ambiguous.
We provide an aggregated overview of the necessary components, and roles, as well as the associated architecture and principles.
arXiv Detail & Related papers (2022-05-04T19:38:48Z) - Characterizing and Detecting Mismatch in Machine-Learning-Enabled
Systems [1.4695979686066065]
Development and deployment of machine learning systems remains a challenge.
In this paper, we report our findings and their implications for improving end-to-end ML-enabled system development.
arXiv Detail & Related papers (2021-03-25T19:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.