Collective Knowledge: organizing research projects as a database of
reusable components and portable workflows with common APIs
- URL: http://arxiv.org/abs/2011.01149v2
- Date: Sat, 30 Jan 2021 15:01:14 GMT
- Title: Collective Knowledge: organizing research projects as a database of
reusable components and portable workflows with common APIs
- Authors: Grigori Fursin
- Abstract summary: This article provides the motivation and overview of the Collective Knowledge framework (CK or cKnowledge)
The CK concept is to decompose research projects into reusable components that encapsulate research artifacts.
The long-term goal is to accelerate innovation by connecting researchers and practitioners to share and reuse all their knowledge.
- Score: 0.2538209532048866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article provides the motivation and overview of the Collective Knowledge
framework (CK or cKnowledge). The CK concept is to decompose research projects
into reusable components that encapsulate research artifacts and provide
unified application programming interfaces (APIs), command-line interfaces
(CLIs), meta descriptions and common automation actions for related artifacts.
The CK framework is used to organize and manage research projects as a database
of such components.
Inspired by the USB "plug and play" approach for hardware, CK also helps to
assemble portable workflows that can automatically plug in compatible
components from different users and vendors (models, datasets, frameworks,
compilers, tools). Such workflows can build and run algorithms on different
platforms and environments in a unified way using the universal CK program
pipeline with software detection plugins and the automatic installation of
missing packages.
This article presents a number of industrial projects in which the modular CK
approach was successfully validated in order to automate benchmarking,
auto-tuning and co-design of efficient software and hardware for machine
learning (ML) and artificial intelligence (AI) in terms of speed, accuracy,
energy, size and various costs. The CK framework also helped to automate the
artifact evaluation process at several computer science conferences as well as
to make it easier to reproduce, compare and reuse research techniques from
published papers, deploy them in production, and automatically adapt them to
continuously changing datasets, models and systems.
The long-term goal is to accelerate innovation by connecting researchers and
practitioners to share and reuse all their knowledge, best practices,
artifacts, workflows and experimental results in a common, portable and
reproducible format at https://cKnowledge.io .
Related papers
- CARLOS: An Open, Modular, and Scalable Simulation Framework for the Development and Testing of Software for C-ITS [0.0]
We propose CARLOS - an open, modular, and scalable simulation framework for the development and testing of software in C-ITS.
We provide core building blocks for this framework and explain how it can be used and extended by the community.
In our paper, we motivate the architecture by describing important design principles and showcasing three major use cases.
arXiv Detail & Related papers (2024-04-02T10:48:36Z) - Automated User Story Generation with Test Case Specification Using Large Language Model [0.0]
We developed a tool "GeneUS" to automatically create user stories from requirements documents.
The output is provided in format leaving the possibilities open for downstream integration to the popular project management tools.
arXiv Detail & Related papers (2024-04-02T01:45:57Z) - DevBench: A Comprehensive Benchmark for Software Development [72.24266814625685]
DevBench is a benchmark that evaluates large language models (LLMs) across various stages of the software development lifecycle.
Empirical studies show that current LLMs, including GPT-4-Turbo, fail to solve the challenges presented within DevBench.
Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.
arXiv Detail & Related papers (2024-03-13T15:13:44Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data
Format [88.33443450434521]
Task-oriented dialogue (TOD) systems function as digital assistants, guiding users through various tasks such as booking flights or finding restaurants.
Existing toolkits for building TOD systems often fall short of in delivering comprehensive arrays of data, models, and experimental environments.
We introduce ConvLab-3: a multifaceted dialogue system toolkit crafted to bridge this gap.
arXiv Detail & Related papers (2022-11-30T16:37:42Z) - Modular approach to data preprocessing in ALOHA and application to a
smart industry use case [0.0]
The paper addresses a modular approach, integrated into the ALOHA tool flow, to support the data preprocessing and transformation pipeline.
To demonstrate the effectiveness of the approach, we present some experimental results related to a keyword spotting use case.
arXiv Detail & Related papers (2021-02-02T06:48:51Z) - The Collective Knowledge project: making ML models more portable and
reproducible with open APIs, reusable best practices and MLOps [0.2538209532048866]
This article provides an overview of the Collective Knowledge technology (CK or cKnowledge CK)
CK attempts to make it easier to reproduce ML&systems research, deploy ML models in production, and adapt them to changing data sets, models, research techniques, software, and hardware.
arXiv Detail & Related papers (2020-06-12T13:18:52Z) - FastReID: A Pytorch Toolbox for General Instance Re-identification [70.10996607445725]
General Instance Re-identification is a very important task in the computer vision.
We present FastReID as a widely used software system in JD AI Research.
We have implemented some state-of-the-art projects, including person re-id, partial re-id, cross-domain re-id and vehicle re-id.
arXiv Detail & Related papers (2020-06-04T03:51:43Z) - SciWING -- A Software Toolkit for Scientific Document Processing [21.394568145639894]
SciWING provides access to pre-trained models for scientific document processing tasks.
It includes ready-to-use web and terminal-based applications and demonstrations.
arXiv Detail & Related papers (2020-04-08T04:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.