Related papers: Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

URL: http://arxiv.org/abs/2401.12230v1
Date: Wed, 17 Jan 2024 20:34:11 GMT
Title: Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Authors: Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo
Abstract summary: We describe an AI-native computing paradigm that harnesses the power of both cloudnative technologies and advanced machine learning inference. These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility.
Score: 46.7766555589807
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area.

Related papers

AI Flow: Perspectives, Scenarios, and Approaches [51.38621621775711]
We introduce AI Flow, a framework that integrates cutting-edge IT and CT advancements.<n>First, device-edge-cloud framework serves as the foundation, which integrates end devices, edge servers, and cloud clusters.<n>Second, we introduce the concept of familial models, which refers to a series of different-sized models with aligned hidden features.<n>Third, connectivity- and interaction-based intelligence emergence is a novel paradigm of AI Flow.
arXiv Detail & Related papers (2025-06-14T12:43:07Z)
Cloud Platforms for Developing Generative AI Solutions: A Scoping Review of Tools and Services [0.27649989102029926]
Generative AI is transforming enterprise application development by enabling machines to create content, code, and designs. Cloud computing addresses these needs by offering infrastructure to train, deploy, and scale generative AI models. This review examines cloud services for generative AI, focusing on key providers like Amazon Web Services (AWS), Microsoft Azure, Google Cloud, IBM Cloud, Oracle Cloud, and Alibaba Cloud.
arXiv Detail & Related papers (2024-12-08T19:49:07Z)
Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI [11.50609298355243]
We propose and experimentally demonstrate an optical cloud computing system that can be seamlessly deployed across edge-metro network. By modulating inputs and models into light, a wide range of edge nodes can directly access the optical computing center via the edge-metro network. The experimental validations show an energy efficiency of 118.6 mW/TOPs (tera operations per second), reducing energy consumption by two orders of magnitude compared to traditional electronic-based cloud computing solutions.
arXiv Detail & Related papers (2024-12-04T11:49:13Z)
Two-Timescale Model Caching and Resource Allocation for Edge-Enabled AI-Generated Content Services [55.0337199834612]
Generative AI (GenAI) has emerged as a transformative technology, enabling customized and personalized AI-generated content (AIGC) services. These services require executing GenAI models with billions of parameters, posing significant obstacles to resource-limited wireless edge. We introduce the formulation of joint model caching and resource allocation for AIGC services to balance a trade-off between AIGC quality and latency metrics.
arXiv Detail & Related papers (2024-11-03T07:01:13Z)
AI-Generated Images as Data Source: The Dawn of Synthetic Era [61.879821573066216]
generative AI has unlocked the potential to create synthetic images that closely resemble real-world photographs. This paper explores the innovative concept of harnessing these AI-generated images as new data sources. In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability.
arXiv Detail & Related papers (2023-10-03T06:55:19Z)
SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant. It generates high-quality instruction-based data for the domain of software engineering. It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z)
An Overview on Generative AI at Scale with Edge-Cloud Computing [28.98486923400986]
generative artificial intelligence (GenAI) generates new content that resembles what is created by humans. The rapid development of GenAI systems has created a huge amount of new data on the Internet. It is attractive to build GenAI systems at scale by leveraging the edge-cloud computing paradigm.
arXiv Detail & Related papers (2023-06-02T06:24:15Z)
Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency [0.0]
In recent years, the integration of artificial intelligence (AI) and cloud computing has emerged as a promising avenue for addressing the growing computational demands of AI applications. This paper presents a comprehensive study of scalable, distributed AI frameworks leveraging cloud computing for enhanced deep learning performance and efficiency.
arXiv Detail & Related papers (2023-04-26T15:38:00Z)
Molecular Dynamics Simulations on Cloud Computing and Machine Learning Platforms [0.8093262393618671]
We see a paradigm shift in the computational structure, design, and requirements of scientific computing applications. Data-driven and machine learning approaches are being used to support, speed-up, and enhance scientific computing applications. Cloud computing platforms are increasingly appealing for scientific computing.
arXiv Detail & Related papers (2021-11-11T21:20:26Z)
Edge-Cloud Polarization and Collaboration: A Comprehensive Survey [61.05059817550049]
We conduct a systematic review for both cloud and edge AI. We are the first to set up the collaborative learning mechanism for cloud and edge modeling. We discuss potentials and practical experiences of some on-going advanced edge AI topics.
arXiv Detail & Related papers (2021-11-11T05:58:23Z)
Auto-Split: A General Framework of Collaborative Edge-Cloud AI [49.750972428032355]
This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud. To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.
arXiv Detail & Related papers (2021-08-30T08:03:29Z)
The MIT Supercloud Dataset [3.375826083518709]
We introduce the MIT Supercloud dataset which aims to foster innovative AI/ML approaches to the analysis of large scale HPC and datacenter/cloud operations. We provide detailed monitoring logs from the MIT Supercloud system, which include CPU and GPU usage by jobs, memory usage, file system logs, and physical monitoring data. This paper discusses the details of the dataset, collection methodology, data availability, and discusses potential challenge problems being developed using this data.
arXiv Detail & Related papers (2021-08-04T13:06:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.