Related papers: Enabling Secure and Ephemeral AI Workloads in Data Mesh Environments

Enabling Secure and Ephemeral AI Workloads in Data Mesh Environments

URL: http://arxiv.org/abs/2506.00352v1
Date: Sat, 31 May 2025 02:30:22 GMT
Title: Enabling Secure and Ephemeral AI Workloads in Data Mesh Environments
Authors: Chinkit Patel, Kee Siong Ng,
Abstract summary: Many large enterprises have no efficient and effective way to support their Data and AI teams.<n>This paper proposes a key piece of the solution to the overall problem, in the form of an on-demand self-service data-platform infrastructure.
Score: 3.322555975389833
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many large enterprises that operate highly governed and complex ICT environments have no efficient and effective way to support their Data and AI teams in rapidly spinning up and tearing down self-service data and compute infrastructure, to experiment with new data analytic tools, and deploy data products into operational use. This paper proposes a key piece of the solution to the overall problem, in the form of an on-demand self-service data-platform infrastructure to empower de-centralised data teams to build data products on top of centralised templates, policies and governance. The core innovation is an efficient method to leverage immutable container operating systems and infrastructure-as-code methodologies for creating, from scratch, vendor-neutral and short-lived Kubernetes clusters on-premises and in any cloud environment. Our proposed approach can serve as a repeatable, portable and cost-efficient alternative or complement to commercial Platform-as-a-Service (PaaS) offerings, and this is particularly important in supporting interoperability in complex data mesh environments with a mix of modern and legacy compute infrastructure.

Related papers

Toward Data Systems That Are Business Semantic Centric and AI Agents Assisted [0.0]
Business Semantics Centric, AI Agents Assisted Data System (BSDS)<n>BSDS redefines data systems as dynamic enablers of business success.<n>System includes curated data linked to business entities, knowledge base for context-aware AI agents, and efficient data pipelines.
arXiv Detail & Related papers (2025-06-05T19:06:06Z)
Unlocking the Value of Decentralized Data: A Federated Dual Learning Approach for Model Aggregation [20.023295646723312]
Federated Learning (FL) offers a promising alternative by enabling AI models to be trained on decentralized data.<n>Existing FL approaches struggle to match the performance of centralized training due to challenges such as heterogeneous data distribution and communication delays.<n>We propose a dual learning approach that leverages centralized data at the server to guide the merging of model updates from clients.
arXiv Detail & Related papers (2025-03-26T01:00:35Z)
Towards Human-Guided, Data-Centric LLM Co-Pilots [53.35493881390917]
CliMB-DC is a human-guided, data-centric framework for machine learning co-pilots.<n>It combines advanced data-centric tools with LLM-driven reasoning to enable robust, context-aware data processing.<n>We show how CliMB-DC can transform uncurated datasets into ML-ready formats.
arXiv Detail & Related papers (2025-01-17T17:51:22Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
A Blueprint Architecture of Compound AI Systems for Enterprise [18.109450556443782]
We introduce a blueprint architecture for compound AI systems to operate in enterprise settings cost-effectively and feasibly. Our proposed architecture aims for seamless integration with existing compute and data infrastructure, with stream'' serving as the key orchestration concept.
arXiv Detail & Related papers (2024-06-02T01:16:32Z)
Blockchain-enabled Trustworthy Federated Unlearning [50.01101423318312]
Federated unlearning is a promising paradigm for protecting the data ownership of distributed clients. Existing works require central servers to retain the historical model parameters from distributed clients. This paper proposes a new blockchain-enabled trustworthy federated unlearning framework.
arXiv Detail & Related papers (2024-01-29T07:04:48Z)
Bringing AI to the edge: A formal M&S specification to deploy effective IoT architectures [0.0]
The Internet of Things is transforming our society, providing new services that improve the quality of life and resource management. These applications are based on ubiquitous networks of multiple distributed devices, with limited computing resources and power. New architectures such as fog computing are emerging to bring computing infrastructure closer to data sources.
arXiv Detail & Related papers (2023-05-11T21:29:58Z)
Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server. We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources. We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z)
Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks [0.18416014644193063]
The main challenge in achieving a global energy efficiency strategy based on Artificial Intelligence is that we need massive amounts of data to feed the algorithms. This paper proposes a time-series data augmentation methodology based on synthetic scenario forecasting within the Data Center. Our research will help to optimize the energy consumed in Data Centers, although the proposed methodology can be employed in any similar time-series-like problem.
arXiv Detail & Related papers (2022-01-12T15:09:10Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
CateCom: a practical data-centric approach to categorization of computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models. We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z)
The MIT Supercloud Dataset [3.375826083518709]
We introduce the MIT Supercloud dataset which aims to foster innovative AI/ML approaches to the analysis of large scale HPC and datacenter/cloud operations. We provide detailed monitoring logs from the MIT Supercloud system, which include CPU and GPU usage by jobs, memory usage, file system logs, and physical monitoring data. This paper discusses the details of the dataset, collection methodology, data availability, and discusses potential challenge problems being developed using this data.
arXiv Detail & Related papers (2021-08-04T13:06:17Z)
A Privacy-Preserving Distributed Architecture for Deep-Learning-as-a-Service [68.84245063902908]
This paper introduces a novel distributed architecture for deep-learning-as-a-service. It is able to preserve the user sensitive data while providing Cloud-based machine and deep learning services.
arXiv Detail & Related papers (2020-03-30T15:12:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.