Related papers: Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments

Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments

URL: http://arxiv.org/abs/2406.16791v1
Date: Mon, 24 Jun 2024 16:55:03 GMT
Title: Enabling more efficient and cost-effective AI/ML systems with Collective Mind, virtualized MLOps, MLPerf, Collective Knowledge Playground and reproducible optimization tournaments
Authors: Grigori Fursin,
Abstract summary: I present my community effort to automatically co-design cheaper, faster and more energy-efficient workloads for AI, ML and other popular workloads. I developed CM to modularize, automate and virtualize the tedious process of building, running, profiling and optimizing complex applications across rapidly evolving open-source and proprietary AI/ML models, datasets, software and hardware. I donated CM and CM4MLOps to help connect academia and industry to learn how to build and run AI and other emerging workloads in the most efficient and cost-effective way.
Score: 0.09065034043031665
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this white paper, I present my community effort to automatically co-design cheaper, faster and more energy-efficient software and hardware for AI, ML and other popular workloads with the help of the Collective Mind framework (CM), virtualized MLOps, MLPerf benchmarks and reproducible optimization tournaments. I developed CM to modularize, automate and virtualize the tedious process of building, running, profiling and optimizing complex applications across rapidly evolving open-source and proprietary AI/ML models, datasets, software and hardware. I achieved that with the help of portable, reusable and technology-agnostic automation recipes (ResearchOps) for MLOps and DevOps (CM4MLOps) discovered in close collaboration with academia and industry when reproducing more than 150 research papers and organizing the 1st mass-scale community benchmarking of ML and AI systems using CM and MLPerf. I donated CM and CM4MLOps to MLCommons to help connect academia and industry to learn how to build and run AI and other emerging workloads in the most efficient and cost-effective way using a common and technology-agnostic automation, virtualization and reproducibility framework while unifying knowledge exchange, protecting everyone's intellectual property, enabling portable skills, and accelerating transfer of the state-of-the-art research to production. My long-term vision is to make AI accessible to everyone by making it a commodity automatically produced from the most suitable open-source and proprietary components from different vendors based on user demand, requirements and constraints such as cost, latency, throughput, accuracy, energy, size and other important characteristics.

Related papers

MLZero: A Multi-Agent System for End-to-end Machine Learning Automation [48.716299953336346]
We introduce MLZero, a novel multi-agent framework powered by Large Language Models (LLMs)<n>A cognitive perception module is first employed, transforming raw multimodal inputs into perceptual context.<n> MLZero demonstrates superior performance on MLE-Bench Lite, outperforming all competitors in both success rate and solution quality.
arXiv Detail & Related papers (2025-05-20T05:20:53Z)
Ilargi: a GPU Compatible Factorized ML Model Training Framework [11.291108172692438]
Ilargi is a novel factorized learning framework that facilitates automatic factorization across CPU and GPU environments without the need for costly relational joins. Ilargi incorporates an ML-based cost estimator to intelligently selects between factorization and materialization based on data properties, algorithm complexity, hardware environments, and their interactions.
arXiv Detail & Related papers (2025-02-04T03:59:17Z)
Transforming the Hybrid Cloud for Emerging AI Workloads [81.15269563290326]
This white paper envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads. The proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms.
arXiv Detail & Related papers (2024-11-20T11:57:43Z)
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems [80.69865295743149]
This work attempts to study using LLM-based agents to design collaborative AI systems autonomously. Based on ComfyBench, we develop ComfyAgent, a framework that empowers agents to autonomously design collaborative AI systems by generating. While ComfyAgent achieves a comparable resolve rate to o1-preview and significantly surpasses other agents on ComfyBench, ComfyAgent has resolved only 15% of creative tasks.
arXiv Detail & Related papers (2024-09-02T17:44:10Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
CARLOS: An Open, Modular, and Scalable Simulation Framework for the Development and Testing of Software for C-ITS [0.0]
We propose CARLOS - an open, modular, and scalable simulation framework for the development and testing of software in C-ITS. We provide core building blocks for this framework and explain how it can be used and extended by the community. In our paper, we motivate the architecture by describing important design principles and showcasing three major use cases.
arXiv Detail & Related papers (2024-04-02T10:48:36Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development. We introduce Experiential Co-Learning, a novel LLM-agent learning framework. Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z)
Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs [14.397623940689487]
Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU), and enhanced GPU platforms are reviewed. This research provides a preliminary evaluation and comparison of these commercial AI/ML accelerators.
arXiv Detail & Related papers (2023-11-08T01:06:25Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques. We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z)
SeLoC-ML: Semantic Low-Code Engineering for Machine Learning Applications in Industrial IoT [9.477629856092218]
This paper presents a framework called Semantic Low-Code Engineering for ML Applications (SeLoC-ML) SeLoC-ML enables non-experts to model, discover, reuse, and matchmake ML models and devices at scale. Developers can benefit from semantic application templates, called recipes, to fast prototype end-user applications.
arXiv Detail & Related papers (2022-07-18T13:06:21Z)
Enabling Automated Machine Learning for Model-Driven AI Engineering [60.09869520679979]
We propose a novel approach to enable Model-Driven Software Engineering and Model-Driven AI Engineering. In particular, we support Automated ML, thus assisting software engineers without deep AI knowledge in developing AI-intensive systems.
arXiv Detail & Related papers (2022-03-06T10:12:56Z)
CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs [2.2177069086277195]
CFU Playground is a full-stack open-source framework that enables rapid and iterative design of machine learning (ML) accelerators for embedded ML systems. Our tool provides a completely open-source end-to-end flow for hardware-software co-design on FPGAs and future systems research. Our rapid, deploy-profile-optimization feedback loop lets ML hardware and software developers achieve significant returns out of a relatively small investment.
arXiv Detail & Related papers (2022-01-05T23:15:58Z)
MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms. We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIs [0.2538209532048866]
This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge) The CK concept is to decompose research projects into reusable components that encapsulate research artifacts. The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge.
arXiv Detail & Related papers (2020-11-02T17:42:59Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)
The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOps [0.2538209532048866]
This article provides an overview of the Collective Knowledge technology (CK or cKnowledge CK) CK attempts to make it easier to reproduce ML&systems research, deploy ML models in production, and adapt them to changing data sets, models, research techniques, software, and hardware.
arXiv Detail & Related papers (2020-06-12T13:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.