The AI_INFN Platform: Artificial Intelligence Development in the Cloud
- URL: http://arxiv.org/abs/2509.22117v2
- Date: Wed, 29 Oct 2025 14:33:07 GMT
- Title: The AI_INFN Platform: Artificial Intelligence Development in the Cloud
- Authors: Lucio Anderlini, Giulio Bianchini, Diego Ciangottini, Stefano Dal Pra, Diego Michelotto, Rosa Petrini, Daniele Spiga,
- Abstract summary: The INFN initiative AI_INFN (Artificial Intelligence at INFN) seeks to promote the use of ML methods across various INFN research scenarios.<n>We will present preliminary benchmarks, functional tests, and case studies, demonstrating both performance and integration outcomes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Learning (ML) is profoundly reshaping the way researchers create, implement, and operate data-intensive software. Its adoption, however, introduces notable challenges for computing infrastructures, particularly when it comes to coordinating access to hardware accelerators across development, testing, and production environments. The INFN initiative AI_INFN (Artificial Intelligence at INFN) seeks to promote the use of ML methods across various INFN research scenarios by offering comprehensive technical support, including access to AI-focused computational resources. Leveraging the INFN Cloud ecosystem and cloud-native technologies, the project emphasizes efficient sharing of accelerator hardware while maintaining the breadth of the Institute's research activities. This contribution describes the deployment and commissioning of a Kubernetes-based platform designed to simplify GPU-powered data analysis workflows and enable their scalable execution on heterogeneous distributed resources. By integrating offloading mechanisms through Virtual Kubelet and the InterLink API, the platform allows workflows to span multiple resource providers, from Worldwide LHC Computing Grid sites to high-performance computing centers like CINECA Leonardo. We will present preliminary benchmarks, functional tests, and case studies, demonstrating both performance and integration outcomes.
Related papers
- High-Performance Serverless Computing: A Systematic Literature Review on Serverless for HPC, AI, and Big Data [0.8199696350352799]
This paper presents a systematic literature review of 122 research articles published between 2018 and early 2025.<n>It explores the use of the serverless paradigm to develop, deploy, and orchestrate compute-intensive applications across cloud, high-performance computing, and hybrid environments.
arXiv Detail & Related papers (2026-01-14T10:10:20Z) - A Survey on Cloud-Edge-Terminal Collaborative Intelligence in AIoT Networks [49.90474228895655]
Cloud-edge-terminal collaborative intelligence (CETCI) is a fundamental paradigm within the artificial intelligence of things (AIoT) community.<n>CETCI has made significant progress with emerging AIoT applications, moving beyond isolated layer optimization to deployable collaborative intelligence systems.<n>This survey describes foundational architectures, enabling technologies, and scenarios of CETCI paradigms, offering a tutorial-style review for CISAIOT beginners.
arXiv Detail & Related papers (2025-08-26T08:38:01Z) - Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [58.50944604905037]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z) - Supporting the development of Machine Learning for fundamental science in a federated Cloud with the AI_INFN platform [32.73124984242397]
Machine Learning (ML) is driving a revolution in the way scientists design, develop, and deploy data-intensive software.<n>The adoption of ML presents new challenges for the computing infrastructure, particularly in terms of provisioning and orchestrating access to hardware accelerators for development, testing, and production.<n>The INFN-funded project AI_INFN ("Artificial Intelligence at INFN") aims at fostering the adoption of ML techniques within INFN use cases by providing support on multiple aspects, including the provision of AI-native computing resources.
arXiv Detail & Related papers (2025-02-28T17:42:58Z) - Transforming the Hybrid Cloud for Emerging AI Workloads [82.21522417363666]
This white paper envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads.<n>The proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness.<n>This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms.
arXiv Detail & Related papers (2024-11-20T11:57:43Z) - Uncertainty Estimation in Multi-Agent Distributed Learning for AI-Enabled Edge Devices [0.0]
Edge IoT devices have seen a paradigm shift with the introduction of FPGAs and AI accelerators.
This advancement has vastly amplified their computational capabilities, emphasizing the practicality of edge AI.
Our study explores methods that enable distributed data processing through AI-enabled edge devices, enhancing collaborative learning capabilities.
arXiv Detail & Related papers (2024-03-14T07:40:32Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - The MIT Supercloud Workload Classification Challenge [10.458111248130944]
In this paper, we present a workload classification challenge based on the MIT Supercloud dataset.
The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads.
arXiv Detail & Related papers (2022-04-12T14:28:04Z) - Towards AIOps in Edge Computing Environments [60.27785717687999]
This paper describes the system design of an AIOps platform which is applicable in heterogeneous, distributed environments.
It is feasible to collect metrics with a high frequency and simultaneously run specific anomaly detection algorithms directly on edge devices.
arXiv Detail & Related papers (2021-02-12T09:33:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.