Operationalizing Machine Learning: An Interview Study
- URL: http://arxiv.org/abs/2209.09125v1
- Date: Fri, 16 Sep 2022 16:59:36 GMT
- Title: Operationalizing Machine Learning: An Interview Study
- Authors: Shreya Shankar, Rolando Garcia, Joseph M. Hellerstein, Aditya G.
Parameswaran
- Abstract summary: We conduct semi-structured interviews with 18 machine learning engineers (MLEs) working across many applications.
Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning.
We summarize common practices for successful ML experimentation, deployment, and sustaining production performance.
- Score: 13.300075655862573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Organizations rely on machine learning engineers (MLEs) to operationalize ML,
i.e., deploy and maintain ML pipelines in production. The process of
operationalizing ML, or MLOps, consists of a continual loop of (i) data
collection and labeling, (ii) experimentation to improve ML performance, (iii)
evaluation throughout a multi-staged deployment process, and (iv) monitoring of
performance drops in production. When considered together, these
responsibilities seem staggering -- how does anyone do MLOps, what are the
unaddressed challenges, and what are the implications for tool builders?
We conducted semi-structured ethnographic interviews with 18 MLEs working
across many applications, including chatbots, autonomous vehicles, and finance.
Our interviews expose three variables that govern success for a production ML
deployment: Velocity, Validation, and Versioning. We summarize common practices
for successful ML experimentation, deployment, and sustaining production
performance. Finally, we discuss interviewees' pain points and anti-patterns,
with implications for tool design.
Related papers
- Machine Learning Operations: A Mapping Study [0.0]
This article discusses the issues that exist in several components of the MLOps pipeline.
A systematic mapping study is performed to identify the challenges that arise in the MLOps system categorized by different focus areas.
The main value of this work is it maps distinctive challenges in MLOps along with the recommended solutions outlined in our study.
arXiv Detail & Related papers (2024-09-28T17:17:40Z) - Maintainability Challenges in ML: A Systematic Literature Review [5.669063174637433]
This study aims to identify and synthesise the maintainability challenges in different stages of the Machine Learning workflow.
We screened more than 13000 papers, then selected and qualitatively analysed 56 of them.
arXiv Detail & Related papers (2024-08-17T13:24:15Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Chain of Tools: Large Language Model is an Automatic Multi-tool Learner [54.992464510992605]
Automatic Tool Chain (ATC) is a framework that enables the large language models (LLMs) to act as a multi-tool user.
To scale up the scope of the tools, we next propose a black-box probing method.
For a comprehensive evaluation, we build a challenging benchmark named ToolFlow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - From Summary to Action: Enhancing Large Language Models for Complex
Tasks with Open World APIs [62.496139001509114]
We introduce a novel tool invocation pipeline designed to control massive real-world APIs.
This pipeline mirrors the human task-solving process, addressing complicated real-life user queries.
Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements.
arXiv Detail & Related papers (2024-02-28T08:42:23Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation [96.71370747681078]
We introduce MLAgentBench, a suite of 13 tasks ranging from improving model performance on CIFAR-10 to recent research problems like BabyLM.
For each task, an agent can perform actions like reading/writing files, executing code, and inspecting outputs.
We benchmark agents based on Claude v1.0, Claude v2.1, Claude v3 Opus, GPT-4, GPT-4-turbo, Gemini-Pro, and Mixtral and find that a Claude v3 Opus agent is the best in terms of success rate.
arXiv Detail & Related papers (2023-10-05T04:06:12Z) - Reasonable Scale Machine Learning with Open-Source Metaflow [2.637746074346334]
We argue that re-purposing existing tools won't solve the current productivity issues.
We introduce Metaflow, an open-source framework for ML projects explicitly designed to boost the productivity of data practitioners.
arXiv Detail & Related papers (2023-03-21T11:28:09Z) - Machine Learning Operations (MLOps): Overview, Definition, and
Architecture [0.0]
The paradigm of Machine Learning Operations (MLOps) addresses this issue.
MLOps is still a vague term and its consequences for researchers and professionals are ambiguous.
We provide an aggregated overview of the necessary components, and roles, as well as the associated architecture and principles.
arXiv Detail & Related papers (2022-05-04T19:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.