Related papers: MUSE: Multi-Tenant Model Serving With Seamless Model Updates

MUSE: Multi-Tenant Model Serving With Seamless Model Updates

URL: http://arxiv.org/abs/2602.11776v1
Date: Thu, 12 Feb 2026 09:54:23 GMT
Title: MUSE: Multi-Tenant Model Serving With Seamless Model Updates
Authors: Cláudio Correia, Alberto E. A. Ferreira, Lucas Martins, Miguel P. Bento, Sofia Guerreiro, Ricardo Ribeiro Pereira, Ana Sofia Gomes, Jacopo Bono, Hugo Ferreira, Pedro Bizarro,
Abstract summary: MUSE enables seamless model updates by decoupling model scores from client decision boundaries.<n>MUSE processes over a thousand events per second, and over 55 billion events in the last 12 months, across several dozens of tenants.
Score: 5.4431781060518425
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In binary classification systems, decision thresholds translate model scores into actions. Choosing suitable thresholds relies on the specific distribution of the underlying model scores but also on the specific business decisions of each client using that model. However, retraining models inevitably shifts score distributions, invalidating existing thresholds. In multi-tenant Score-as-a-Service environments, where decision boundaries reside in client-managed infrastructure, this creates a severe bottleneck: recalibration requires coordinating threshold updates across hundreds of clients, consuming excessive human hours and leading to model stagnation. We introduce MUSE, a model serving framework that enables seamless model updates by decoupling model scores from client decision boundaries. Designed for multi-tenancy, MUSE optimizes infrastructure re-use by sharing models via dynamic intent-based routing, combined with a two-level score transformation that maps model outputs to a stable, reference distribution. Deployed at scale by Feedzai, MUSE processes over a thousand events per second, and over 55 billion events in the last 12 months, across several dozens of tenants, while maintaining high-availability and low-latency guarantees. By reducing model lead time from weeks to minutes, MUSE promotes model resilience against shifting attacks, saving millions of dollars in fraud losses and operational costs.

Related papers

RQ-GMM: Residual Quantized Gaussian Mixture Model for Multimodal Semantic Discretization in CTR Prediction [29.97246591569267]
Discretizing embeddings into semantic IDs before feeding them into CTR models offers a more effective solution.<n>We propose RQ-GMM, which introduces probabilistic modeling to better capture the statistical structure of multimodal embedding spaces.<n>RQ-GMM achieves superior codebook utilization and reconstruction accuracy.
arXiv Detail & Related papers (2026-02-13T04:11:24Z)
FedRef: Communication-Efficient Bayesian Fine-Tuning using a Reference Model [0.7100520098029438]
Federated learning (FL) collaboratively trains artificial intelligence (AI) models to ensure user data privacy.<n>Previous studies have proposed model optimization, fine-tuning, and personalization to achieve improved model performance.<n>We propose a reference model-based fine-tuning method for federated learning that overcomes catastrophic forgetting in each round.
arXiv Detail & Related papers (2025-06-29T12:41:11Z)
Intention-Conditioned Flow Occupancy Models [80.42634994902858]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
FedMerge: Federated Personalization via Model Merging [51.12769696559237]
One global model might not be sufficient to serve many clients with non-IID tasks and distributions.<n>We propose a novel FedMerge'' approach that can create a personalized model per client by simply merging multiple global models.<n>We evaluate FedMerge on three different non-IID settings applied to different domains with diverse tasks and data types.
arXiv Detail & Related papers (2025-04-09T10:44:14Z)
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures. We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z)
Towards Client Driven Federated Learning [7.528642177161784]
We introduce Client-Driven Federated Learning (CDFL), a novel FL framework that puts clients at the driving role. In CDFL, each client independently and asynchronously updates its model by uploading the locally trained model to the server and receiving a customized model tailored to its local task.
arXiv Detail & Related papers (2024-05-24T10:17:49Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
One-Shot Heterogeneous Federated Learning with Local Model-Guided Diffusion Models [40.83058938096914]
FedLMG is a one-shot Federated learning method with Local Model-Guided diffusion models.<n>Clients do not need access to any foundation models but only train and upload their local models.
arXiv Detail & Related papers (2023-11-15T11:11:25Z)
Federated Topic Model and Model Pruning Based on Variational Autoencoder [14.737942599204064]
Federated topic modeling allows multiple parties to jointly train models while protecting data privacy. This paper proposes a method to establish a federated topic model while ensuring the privacy of each node, and use neural network model pruning to accelerate the model. Experimental results show that the federated topic model pruning can greatly accelerate the model training speed while ensuring the model's performance.
arXiv Detail & Related papers (2023-11-01T06:00:14Z)
Billion-user Customer Lifetime Value Prediction: An Industrial-scale Solution from Kuaishou [19.31651596803956]
Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business. Modeling LTV is a challenging problem, due to its complex and mutable data distribution. We introduce an Order Dependency Monotonic Network (ODMN) that models the ordered dependencies between LTVs of different time spans.
arXiv Detail & Related papers (2022-08-29T04:05:21Z)
Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers. We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z)
A Bayesian Federated Learning Framework with Online Laplace Approximation [144.7345013348257]
Federated learning allows multiple clients to collaboratively learn a globally shared model. We propose a novel FL framework that uses online Laplace approximation to approximate posteriors on both the client and server side. We achieve state-of-the-art results on several benchmarks, clearly demonstrating the advantages of the proposed method.
arXiv Detail & Related papers (2021-02-03T08:36:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.