Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development
- URL: http://arxiv.org/abs/2404.09151v2
- Date: Tue, 16 Apr 2024 09:29:02 GMT
- Title: Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development
- Authors: Siyuan Feng, Jiawei Liu, Ruihang Lai, Charlie F. Ruan, Yong Yu, Lingming Zhang, Tianqi Chen,
- Abstract summary: We introduce TapML, a top-down approach and tooling designed to streamline the deployment of machine learning systems on diverse platforms.
Unlike traditional bottom-up methods, TapML automates unit testing and adopts a migration-based strategy for gradually offloading model computations.
TapML was developed and applied through a year-long, real-world effort that successfully deployed significant emerging models and platforms.
- Score: 20.873143073842705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deploying machine learning (ML) on diverse computing platforms is crucial to accelerate and broaden their applications. However, it presents significant software engineering challenges due to the fast evolution of models, especially the recent Large Language Models (LLMs), and the emergence of new computing platforms. Current ML frameworks are primarily engineered for CPU and CUDA platforms, leaving a big gap in enabling emerging ones like Metal, Vulkan, and WebGPU. While a traditional bottom-up development pipeline fails to close the gap timely, we introduce TapML, a top-down approach and tooling designed to streamline the deployment of ML systems on diverse platforms, optimized for developer productivity. Unlike traditional bottom-up methods, which involve extensive manual testing and debugging, TapML automates unit testing through test carving and adopts a migration-based strategy for gradually offloading model computations from mature source platforms to emerging target platforms. By leveraging realistic inputs and remote connections for gradual target offloading, TapML accelerates the validation and minimizes debugging scopes, significantly optimizing development efforts. TapML was developed and applied through a year-long, real-world effort that successfully deployed significant emerging models and platforms. Through serious deployments of 82 emerging models in 17 distinct architectures across 5 emerging platforms, we showcase the effectiveness of TapML in enhancing developer productivity while ensuring model reliability and efficiency. Furthermore, we summarize comprehensive case studies from our real-world development, offering best practices for developing emerging ML systems.
Related papers
- Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
We develop an adversarial reasoning approach to automatic jailbreaking via test-time computation.
Our approach introduces a new paradigm in understanding LLM vulnerabilities, laying the foundation for the development of more robust and trustworthy AI systems.
arXiv Detail & Related papers (2025-02-03T18:59:01Z) - Consolidating TinyML Lifecycle with Large Language Models: Reality, Illusion, or Opportunity? [3.1471494780647795]
This paper explores whether Large Language Models (LLMs) could help automate and streamline the TinyML lifecycle.
We develop a framework that leverages the natural language processing (NLP) and code generation capabilities of LLMs to reduce development time and lower the barriers to entry for TinyML deployment.
arXiv Detail & Related papers (2025-01-20T22:20:57Z) - SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [66.74446220401296]
We propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation.
We introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding.
Our code and models shall be released.
arXiv Detail & Related papers (2024-12-12T18:59:26Z) - Towards the interoperability of low-code platforms [1.7450893625541586]
Low-code platforms (LCPs) are becoming popular across various industries.
Among them, vendor lock-in is a major concern, especially considering the lack of interoperability between these platforms.
This work proposes an approach to improve the interoperability of LCPs by (semi)automatically migrating models specified in one platform to another one.
arXiv Detail & Related papers (2024-12-06T14:33:34Z) - Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [78.48606021719206]
Mini-InternVL is a series of MLLMs with parameters ranging from 1B to 4B, which achieves 90% of the performance with only 5% of the parameters.
We develop a unified adaptation framework for Mini-InternVL, which enables our models to transfer and outperform specialized models in downstream tasks.
arXiv Detail & Related papers (2024-10-21T17:58:20Z) - Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities.
In-Context Learning (ICL) and.
Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting.
LLMs to downstream tasks.
We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z) - Model Share AI: An Integrated Toolkit for Collaborative Machine Learning
Model Development, Provenance Tracking, and Deployment in Python [0.0]
We introduce Model Share AI (AIMS), an easy-to-use MLOps platform designed to streamline collaborative model development, model provenance tracking, and model deployment.
AIMS features collaborative project spaces and a standardized model evaluation process that ranks model submissions based on their performance on unseen evaluation data.
AIMS allows users to deploy ML models built in Scikit-Learn, Keras, PyTorch, and ONNX into live REST APIs and automatically generated web apps.
arXiv Detail & Related papers (2023-09-27T15:24:39Z) - SeLoC-ML: Semantic Low-Code Engineering for Machine Learning
Applications in Industrial IoT [9.477629856092218]
This paper presents a framework called Semantic Low-Code Engineering for ML Applications (SeLoC-ML)
SeLoC-ML enables non-experts to model, discover, reuse, and matchmake ML models and devices at scale.
Developers can benefit from semantic application templates, called recipes, to fast prototype end-user applications.
arXiv Detail & Related papers (2022-07-18T13:06:21Z) - YMIR: A Rapid Data-centric Development Platform for Vision Applications [82.67319997259622]
This paper introduces an open source platform for rapid development of computer vision applications.
The platform puts the efficient data development at the center of the machine learning development process.
arXiv Detail & Related papers (2021-11-19T05:02:55Z) - Low-Precision Hardware Architectures Meet Recommendation Model Inference
at Scale [11.121380180647769]
We share in this paper our search strategies to adapt reference recommendation models to low-precision hardware.
We also discuss the design and development of tool chain so as to maintain our models' accuracy throughout their lifespan.
We believe these lessons from the trenches promote better co-design between hardware architecture and software engineering.
arXiv Detail & Related papers (2021-05-26T16:42:33Z) - Quantitatively Assessing the Benefits of Model-driven Development in
Agent-based Modeling and Simulation [80.49040344355431]
This paper compares the use of MDD and ABMS platforms in terms of effort and developer mistakes.
The obtained results show that MDD4ABMS requires less effort to develop simulations with similar (sometimes better) design quality than NetLogo.
arXiv Detail & Related papers (2020-06-15T23:29:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.