PipeSim: Trace-driven Simulation of Large-Scale AI Operations Platforms
- URL: http://arxiv.org/abs/2006.12587v1
- Date: Mon, 22 Jun 2020 19:55:37 GMT
- Title: PipeSim: Trace-driven Simulation of Large-Scale AI Operations Platforms
- Authors: Thomas Rausch and Waldemar Hummer and Vinod Muthusamy
- Abstract summary: We present a trace-driven simulation-based experimentation and analytics environment for large-scale AI systems.
Analytics data from a production-grade AI platform developed at IBM are used to build a comprehensive simulation model.
We implement the model in a standalone, discrete event simulator, and provide a toolkit for running experiments.
- Score: 4.060731229044571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Operationalizing AI has become a major endeavor in both research and
industry. Automated, operationalized pipelines that manage the AI application
lifecycle will form a significant part of tomorrow's infrastructure workloads.
To optimize operations of production-grade AI workflow platforms we can
leverage existing scheduling approaches, yet it is challenging to fine-tune
operational strategies that achieve application-specific cost-benefit tradeoffs
while catering to the specific domain characteristics of machine learning (ML)
models, such as accuracy, robustness, or fairness. We present a trace-driven
simulation-based experimentation and analytics environment that allows
researchers and engineers to devise and evaluate such operational strategies
for large-scale AI workflow systems. Analytics data from a production-grade AI
platform developed at IBM are used to build a comprehensive simulation model.
Our simulation model describes the interaction between pipelines and system
infrastructure, and how pipeline tasks affect different ML model metrics. We
implement the model in a standalone, stochastic, discrete event simulator, and
provide a toolkit for running experiments. Synthetic traces are made available
for ad-hoc exploration as well as statistical analysis of experiments to test
and examine pipeline scheduling, cluster resource allocation, and similar
operational mechanisms.
Related papers
- AgentSimulator: An Agent-based Approach for Data-driven Business Process Simulation [6.590869939300887]
Business process simulation (BPS) is a versatile technique for estimating process performance across various scenarios.
This paper introduces AgentSimulator, a resource-first BPS approach that discovers a multi-agent system from an event log.
Our experiments show that AgentSimulator achieves computation state-of-the-art simulation accuracy with significantly lower times than existing approaches.
arXiv Detail & Related papers (2024-08-16T07:19:11Z) - Towards Next-Generation Urban Decision Support Systems through AI-Powered Construction of Scientific Ontology using Large Language Models -- A Case in Optimizing Intermodal Freight Transportation [1.6230958216521798]
This study investigates the potential of leveraging the pre-trained Large Language Models (LLMs)
By adopting ChatGPT API as the reasoning core, we outline an integrated workflow that encompasses natural language processing, methontology-based prompt tuning, and transformers.
The outcomes of our methodology are knowledge graphs in widely adopted ontology languages (e.g., OWL, RDF, SPARQL)
arXiv Detail & Related papers (2024-05-29T16:40:31Z) - Variational Exploration Module VEM: A Cloud-Native Optimization and
Validation Tool for Geospatial Modeling and AI Workflows [0.0]
Cloud-based deployments help to scale up these modeling and AI.
We have developed the Variational Exploration Module which facilitates the optimization and validation of modeling deployed in the cloud.
The flexibility and robustness of the model-agnostic module is demonstrated using real-world applications.
arXiv Detail & Related papers (2023-11-26T23:07:00Z) - Synthetic Data-Based Simulators for Recommender Systems: A Survey [55.60116686945561]
This survey aims at providing a comprehensive overview of the recent trends in the field of modeling and simulation.
We start with the motivation behind the development of frameworks implementing the simulations -- simulators.
We provide a new consistent classification of existing simulators based on their functionality, approbation, and industrial effectiveness.
arXiv Detail & Related papers (2022-06-22T19:33:21Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Automated Evolutionary Approach for the Design of Composite Machine
Learning Pipelines [48.7576911714538]
The proposed approach is aimed to automate the design of composite machine learning pipelines.
It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them.
The software implementation on this approach is presented as an open-source framework.
arXiv Detail & Related papers (2021-06-26T23:19:06Z) - A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world.
We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms.
Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.