The Collective Knowledge project: making ML models more portable and
reproducible with open APIs, reusable best practices and MLOps
- URL: http://arxiv.org/abs/2006.07161v2
- Date: Thu, 18 Jun 2020 07:28:09 GMT
- Title: The Collective Knowledge project: making ML models more portable and
reproducible with open APIs, reusable best practices and MLOps
- Authors: Grigori Fursin
- Abstract summary: This article provides an overview of the Collective Knowledge technology (CK or cKnowledge CK)
CK attempts to make it easier to reproduce ML&systems research, deploy ML models in production, and adapt them to changing data sets, models, research techniques, software, and hardware.
- Score: 0.2538209532048866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article provides an overview of the Collective Knowledge technology (CK
or cKnowledge). CK attempts to make it easier to reproduce ML&systems research,
deploy ML models in production, and adapt them to continuously changing data
sets, models, research techniques, software, and hardware. The CK concept is to
decompose complex systems and ad-hoc research projects into reusable
sub-components with unified APIs, CLI, and JSON meta description. Such
components can be connected into portable workflows using DevOps principles
combined with reusable automation actions, software detection plugins, meta
packages, and exposed optimization parameters. CK workflows can automatically
plug in different models, data and tools from different vendors while building,
running and benchmarking research code in a unified way across diverse
platforms and environments. Such workflows also help to perform whole system
optimization, reproduce results, and compare them using public or private
scoreboards on the CK platform (https://cKnowledge.io). For example, the
modular CK approach was successfully validated with industrial partners to
automatically co-design and optimize software, hardware, and machine learning
models for reproducible and efficient object detection in terms of speed,
accuracy, energy, size, and other characteristics. The long-term goal is to
simplify and accelerate the development and deployment of ML models and systems
by helping researchers and practitioners to share and reuse their knowledge,
experience, best practices, artifacts, and techniques using open CK APIs.
Related papers
- LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs [74.1976921342982]
This paper introduces UltraEval, a user-friendly evaluation framework characterized by its lightweight nature, comprehensiveness, modularity, and efficiency.
The resulting composability allows for the free combination of different models, tasks, prompts, benchmarks, and metrics within a unified evaluation workflow.
arXiv Detail & Related papers (2024-04-11T09:17:12Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z) - Model Share AI: An Integrated Toolkit for Collaborative Machine Learning
Model Development, Provenance Tracking, and Deployment in Python [0.0]
We introduce Model Share AI (AIMS), an easy-to-use MLOps platform designed to streamline collaborative model development, model provenance tracking, and model deployment.
AIMS features collaborative project spaces and a standardized model evaluation process that ranks model submissions based on their performance on unseen evaluation data.
AIMS allows users to deploy ML models built in Scikit-Learn, Keras, PyTorch, and ONNX into live REST APIs and automatically generated web apps.
arXiv Detail & Related papers (2023-09-27T15:24:39Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z) - MLOps: A Step Forward to Enterprise Machine Learning [0.0]
This research presents a detailed review of MLOps, its benefits, difficulties, evolutions, and important underlying technologies.
The MLOps workflow is explained in detail along with the various tools necessary for both model and data exploration and deployment.
This article also puts light on the end-to-end production of ML projects using various maturity levels of automated pipelines.
arXiv Detail & Related papers (2023-05-27T20:44:14Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Efficient Data-specific Model Search for Collaborative Filtering [56.60519991956558]
Collaborative filtering (CF) is a fundamental approach for recommender systems.
In this paper, motivated by the recent advances in automated machine learning (AutoML), we propose to design a data-specific CF model.
Key here is a new framework that unifies state-of-the-art (SOTA) CF methods and splits them into disjoint stages of input encoding, embedding function, interaction and prediction function.
arXiv Detail & Related papers (2021-06-14T14:30:32Z) - Collective Knowledge: organizing research projects as a database of
reusable components and portable workflows with common APIs [0.2538209532048866]
This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge)
The CK concept is to decompose research projects into reusable components that encapsulate research artifacts.
The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge.
arXiv Detail & Related papers (2020-11-02T17:42:59Z) - MLModelCI: An Automatic Cloud Platform for Efficient MLaaS [15.029094196394862]
We release the platform as an open-source project on GitHub under Apache 2.0 license.
Our system bridges the gap between current ML training and serving systems and thus free developers from manual and tedious work often associated with service deployment.
arXiv Detail & Related papers (2020-06-09T07:48:20Z) - CodeReef: an open platform for portable MLOps, reusable automation
actions and reproducible benchmarking [0.2148535041822524]
We present CodeReef - an open platform to share all the components necessary to enable cross-platform MLOps (MLSysOps)
We also introduce the CodeReef solution - a way to package and share models as non-virtualized, portable, customizable archive files.
arXiv Detail & Related papers (2020-01-22T09:52:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.