Supporting the development of Machine Learning for fundamental science in a federated Cloud with the AI_INFN platform
- URL: http://arxiv.org/abs/2502.21266v1
- Date: Fri, 28 Feb 2025 17:42:58 GMT
- Title: Supporting the development of Machine Learning for fundamental science in a federated Cloud with the AI_INFN platform
- Authors: Lucio Anderlini, Matteo Barbetti, Giulio Bianchini, Diego Ciangottini, Stefano Dal Pra, Diego Michelotto, Carmelo Pellegrino, Rosa Petrini, Alessandro Pascolini, Daniele Spiga,
- Abstract summary: Machine Learning (ML) is driving a revolution in the way scientists design, develop, and deploy data-intensive software.<n>The adoption of ML presents new challenges for the computing infrastructure, particularly in terms of provisioning and orchestrating access to hardware accelerators for development, testing, and production.<n>The INFN-funded project AI_INFN ("Artificial Intelligence at INFN") aims at fostering the adoption of ML techniques within INFN use cases by providing support on multiple aspects, including the provision of AI-native computing resources.
- Score: 32.73124984242397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning (ML) is driving a revolution in the way scientists design, develop, and deploy data-intensive software. However, the adoption of ML presents new challenges for the computing infrastructure, particularly in terms of provisioning and orchestrating access to hardware accelerators for development, testing, and production. The INFN-funded project AI_INFN ("Artificial Intelligence at INFN") aims at fostering the adoption of ML techniques within INFN use cases by providing support on multiple aspects, including the provision of AI-tailored computing resources. It leverages cloud-native solutions in the context of INFN Cloud, to share hardware accelerators as effectively as possible, ensuring the diversity of the Institute's research activities is not compromised. In this contribution, we provide an update on the commissioning of a Kubernetes platform designed to ease the development of GPU-powered data analysis workflows and their scalability on heterogeneous, distributed computing resources, possibly federated as Virtual Kubelets with the interLink provider.
Related papers
- Transforming the Hybrid Cloud for Emerging AI Workloads [81.15269563290326]
This white paper envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads.
The proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness.
This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms.
arXiv Detail & Related papers (2024-11-20T11:57:43Z) - Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs.
At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads.
At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z) - Uncertainty Estimation in Multi-Agent Distributed Learning for AI-Enabled Edge Devices [0.0]
Edge IoT devices have seen a paradigm shift with the introduction of FPGAs and AI accelerators.
This advancement has vastly amplified their computational capabilities, emphasizing the practicality of edge AI.
Our study explores methods that enable distributed data processing through AI-enabled edge devices, enhancing collaborative learning capabilities.
arXiv Detail & Related papers (2024-03-14T07:40:32Z) - Uncertainty Estimation in Multi-Agent Distributed Learning [0.0]
KDT NEUROKIT2E project aims to establish a new open-source framework to facilitate AI applications on edge devices.
Our research focuses on the mechanisms and methodologies that allow edge network-enabled agents to engage in collaborative learning in distributed environments.
arXiv Detail & Related papers (2023-11-22T12:48:20Z) - Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs [14.397623940689487]
Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU), and enhanced GPU platforms are reviewed.
This research provides a preliminary evaluation and comparison of these commercial AI/ML accelerators.
arXiv Detail & Related papers (2023-11-08T01:06:25Z) - Federated Learning-Empowered AI-Generated Content in Wireless Networks [58.48381827268331]
Federated learning (FL) can be leveraged to improve learning efficiency and achieve privacy protection for AIGC.
We present FL-based techniques for empowering AIGC, and aim to enable users to generate diverse, personalized, and high-quality content.
arXiv Detail & Related papers (2023-07-14T04:13:11Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments.
It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z) - Reliable Fleet Analytics for Edge IoT Solutions [0.0]
We propose a framework for facilitating machine learning at the edge for AIoT applications.
The contribution is an architecture that includes services, tools, and methods for delivering fleet analytics at scale.
We present a preliminary validation of the framework by performing experiments with IoT devices on a university campus's rooms.
arXiv Detail & Related papers (2021-01-12T11:28:43Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z) - Convergence of Artificial Intelligence and High Performance Computing on
NSF-supported Cyberinfrastructure [3.4291439418246177]
Artificial Intelligence (AI) applications have powered transformational solutions for big data challenges in industry and technology.
As AI continues to evolve into a computing paradigm endowed with statistical and mathematical rigor, it has become apparent that single- GPU solutions for training, validation, and testing are no longer sufficient.
This realization has been driving the confluence of AI and high performance computing to reduce time-to-insight.
arXiv Detail & Related papers (2020-03-18T18:00:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.