Related papers: Couler: Unified Machine Learning Workflow Optimization in Cloud

Couler: Unified Machine Learning Workflow Optimization in Cloud

URL: http://arxiv.org/abs/2403.07608v1
Date: Tue, 12 Mar 2024 12:47:32 GMT
Title: Couler: Unified Machine Learning Workflow Optimization in Cloud
Authors: Xiaoda Wang, Yuan Tang, Tengda Guo, Bo Sang, Jingji Wu, Jian Sha, Ke Zhang, Jiang Qian, Mingjie Tang
Abstract summary: Couler is a system designed for unified ML workflow optimization in the cloud. We integrate Large Language Models (LLMs) into workflow generation, and provide a unified programming interface for various workflow engines. Couer has successfully improved the CPU/Memory utilization by more than 15% and the workflow completion rate by around 17%.
Score: 6.769259207650922
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, numerous workflow engines are available (with over ten being widely recognized). This variety poses a challenge for end-users in terms of mastering different engine APIs. While efforts have primarily focused on optimizing ML Operations (MLOps) for a specific workflow engine, current methods largely overlook workflow optimization across different engines. In this work, we design and implement Couler, a system designed for unified ML workflow optimization in the cloud. Our main insight lies in the ability to generate an ML workflow using natural language (NL) descriptions. We integrate Large Language Models (LLMs) into workflow generation, and provide a unified programming interface for various workflow engines. This approach alleviates the need to understand various workflow engines' APIs. Moreover, Couler enhances workflow computation efficiency by introducing automated caching at multiple stages, enabling large workflow auto-parallelization and automatic hyperparameters tuning. These enhancements minimize redundant computational costs and improve fault tolerance during deep learning workflow training. Couler is extensively deployed in real-world production scenarios at Ant Group, handling approximately 22k workflows daily, and has successfully improved the CPU/Memory utilization by more than 15% and the workflow completion rate by around 17%.

Related papers

Flow: Modularized Agentic Workflow Automation [53.073598156915615]
Multi-agent frameworks powered by large language models (LLMs) have demonstrated great success in automated planning and task execution. However, the effective adjustment of agentic during execution has not been well studied. In this paper, we define an activity-on-vertex (AOV) graph, which allows continuous workflow refinement by agents. Our proposed multi-agent framework achieves efficient concurrent execution of subtasks, effective goal achievement, and enhanced error tolerance.
arXiv Detail & Related papers (2025-01-14T04:35:37Z)
Opus: A Large Work Model for Complex Workflow Generation [0.0]
Opus is a framework for generating and optimizing tasks tailored to complex Business Process Outsourcing (BPO) use cases. Our approach generates executables from Intention, defined as the alignment of Client Input, Client Output and Process Directed Context.
arXiv Detail & Related papers (2024-11-30T20:00:41Z)
Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey [3.340984908213717]
Building effective machine learning (ML) to address complex tasks is a primary focus of the Automatic ML (AutoML) community. Recently, the integration of Large Language Models (LLMs) into ML has shown great potential for automating and enhancing various stages of the ML pipeline.
arXiv Detail & Related papers (2024-11-11T21:54:26Z)
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models [105.46456444315693]
We presentLLM, a data-centric framework to enhance the capability of large language models in workflow orchestration. It first constructs a large-scale fine-tuningBench with 106,763 samples, covering 1,503 APIs from 83 applications across 28 categories. LlamaLlama demonstrates a strong capacity to orchestrate complex APIs, while also achieving notable generalization performance.
arXiv Detail & Related papers (2024-11-08T09:58:02Z)
AFlow: Automating Agentic Workflow Generation [36.61172223528231]
Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search. Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines.
arXiv Detail & Related papers (2024-10-14T17:40:40Z)
Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorFBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. We also present WorFEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms. We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML [56.565200973244146]
Automated machine learning (AutoML) accelerates AI development by automating tasks in the development pipeline. Recent works have started exploiting large language models (LLM) to lessen such burden. This paper proposes AutoML-Agent, a novel multi-agent framework tailored for full-pipeline AutoML.
arXiv Detail & Related papers (2024-10-03T20:01:09Z)
Agent Workflow Memory [71.81385627556398]
We introduce Agent Memory, a method for inducing commonly reused routines. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate. Online AWM robustly generalizes in cross-task, website, and domain evaluations.
arXiv Detail & Related papers (2024-09-11T17:21:00Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
AutoFlow: Automated Workflow Generation for Large Language Model Agents [39.72700864347576]
Large Language Models (LLMs) have shown significant progress in understanding complex natural language. To make sure LLM Agents follow an effective and reliable procedure to solve the given task, manually designed are usually used. We propose AutoFlow, a framework designed to automatically generate for agents to solve complex tasks.
arXiv Detail & Related papers (2024-07-01T21:05:02Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.