FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model Access
- URL: http://arxiv.org/abs/2510.13724v1
- Date: Wed, 15 Oct 2025 16:28:34 GMT
- Title: FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model Access
- Authors: Aditya Tanikanti, Benoit Côté, Yanfei Guo, Le Chen, Nickolaus Saint, Ryan Chard, Ken Raffenetti, Rajeev Thakur, Thomas Uram, Ian Foster, Michael E. Papka, Venkatram Vishwanath,
- Abstract summary: FIRST provides cloud-like access to diverse AI models, like Large Language Models (LLMs) on existing HPC infrastructure.<n>System allows researchers to run parallel inference workloads via an OpenAI-compliant API on private, secure environments.
- Score: 7.480885391518904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the Federated Inference Resource Scheduling Toolkit (FIRST), a framework enabling Inference-as-a-Service across distributed High-Performance Computing (HPC) clusters. FIRST provides cloud-like access to diverse AI models, like Large Language Models (LLMs), on existing HPC infrastructure. Leveraging Globus Auth and Globus Compute, the system allows researchers to run parallel inference workloads via an OpenAI-compliant API on private, secure environments. This cluster-agnostic API allows requests to be distributed across federated clusters, targeting numerous hosted models. FIRST supports multiple inference backends (e.g., vLLM), auto-scales resources, maintains "hot" nodes for low-latency execution, and offers both high-throughput batch and interactive modes. The framework addresses the growing demand for private, secure, and scalable AI inference in scientific workflows, allowing researchers to generate billions of tokens daily on-premises without relying on commercial cloud infrastructure.
Related papers
- OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z) - One-Shot Hierarchical Federated Clustering [51.490181220883905]
This paper introduces an efficient one-shot hierarchical Federated Clustering framework.<n>It performs client-end distribution exploration and server-end distribution aggregation.<n>It turns out that the complex cluster distributions across clients can be efficiently explored.
arXiv Detail & Related papers (2026-01-10T02:58:33Z) - Federated Learning Framework for Scalable AI in Heterogeneous HPC and Cloud Environments [0.1805840413757548]
We present a federated learning framework built to run efficiently across mixed HPC and cloud environments.<n>Our system addresses key challenges such as system het- erogeneity, communication overhead, and resource scheduling, while maintaining model accuracy and data privacy.
arXiv Detail & Related papers (2025-11-22T18:39:25Z) - LLM-based Multi-Agent Blackboard System for Information Discovery in Data Science [69.1690891731311]
We propose a novel multi-agent communication paradigm inspired by the blackboard architecture for traditional AI models.<n>In this framework, a central agent posts requests to a shared blackboard, and autonomous subordinate agents respond based on their capabilities.<n>We evaluate our method on three benchmarks that require explicit data discovery.
arXiv Detail & Related papers (2025-09-30T22:34:23Z) - OpenCUA: Open Foundations for Computer-Use Agents [74.61449905487565]
Vision-language models have demonstrated impressive capabilities as computer-use agents (CUAs)<n>We propose OpenCUA, a comprehensive open-source framework for scaling CUA data and foundation models.<n>Our end-to-end agent models demonstrate strong performance across CUA benchmarks.
arXiv Detail & Related papers (2025-08-12T17:52:32Z) - Edge-Assisted Collaborative Fine-Tuning for Multi-User Personalized Artificial Intelligence Generated Content (AIGC) [38.59865959433328]
Cloud-based solutions aid in computation but often fall short in addressing privacy risks, personalization efficiency, and communication costs.<n>We propose a novel cluster-aware hierarchical federated aggregation framework.<n>We show that the framework achieves accelerated convergence while maintaining practical viability for scalable multi-user personalized AIGC services.
arXiv Detail & Related papers (2025-08-06T06:07:24Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Clustered FedStack: Intermediate Global Models with Bayesian Information
Criterion [8.478300563501035]
We propose a novel Clustered FedStack framework based on the Stacked Federated Learning (FedStack) framework.
The local clients send their model predictions and output layer weights to a server, which then builds a robust global model.
This global model clusters the local clients based on their output layer weights using a clustering mechanism.
arXiv Detail & Related papers (2023-09-20T03:47:53Z) - The MIT Supercloud Workload Classification Challenge [10.458111248130944]
In this paper, we present a workload classification challenge based on the MIT Supercloud dataset.
The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads.
arXiv Detail & Related papers (2022-04-12T14:28:04Z) - Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn.
We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.