Ensuring Fair LLM Serving Amid Diverse Applications
- URL: http://arxiv.org/abs/2411.15997v1
- Date: Sun, 24 Nov 2024 22:35:44 GMT
- Title: Ensuring Fair LLM Serving Amid Diverse Applications
- Authors: Redwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Renèe St. Amant, Rujia Wang, Yue Cheng, Ali R. Butt, Victor Rühle, Chetan Bansal, Saravan Rajmohan,
- Abstract summary: This paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft.
Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications.
- Score: 13.346272116841288
- License:
- Abstract: In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other users and creating unfairness. Existing fairness approaches do not account for variations in token lengths across applications and multiple LLM calls, making them unsuitable for such platforms. To address the fairness challenge, this paper analyzes millions of requests from thousands of users on MS CoPilot, a real-world multi-tenant LLM platform hosted by Microsoft. Our analysis confirms the inadequacy of existing methods and guides the development of FairServe, a system that ensures fair LLM access across diverse applications. FairServe proposes application-characteristic aware request throttling coupled with a weighted service counter based scheduling technique to curb abusive behavior and ensure fairness. Our experimental results on real-world traces demonstrate FairServe's superior performance compared to the state-of-the-art method in ensuring fairness. We are actively working on deploying our system in production, expecting to benefit millions of customers world-wide.
Related papers
- Revisiting SLO and Goodput Metrics in LLM Serving [17.777554083636716]
Service level objectives (SLOs) and goodput-the number of requests that meet SLOs per second-are introduced to evaluate the performance of LLM serving.
Existing metrics fail to capture the nature of user experience.
We propose a unified metric framework smooth goodput including SLOs and goodput to reflect the nature of user experience.
arXiv Detail & Related papers (2024-10-18T08:05:37Z) - Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge [84.34545223897578]
Despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility.
We identify 12 key potential biases and propose a new automated bias quantification framework-CALM- which quantifies and analyzes each type of bias in LLM-as-a-Judge.
Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.
arXiv Detail & Related papers (2024-10-03T17:53:30Z) - FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model [48.33280660752336]
Large language models (LLMs) show amazing performance on many domain-specific tasks after fine-tuning with some appropriate data.
Many domain-specific data are privately distributed across multiple owners.
We introduce FedBiOT, a resource-efficient LLM fine-tuning approach to federated learning.
arXiv Detail & Related papers (2024-06-25T16:45:47Z) - Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently.
Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting.
We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z) - VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? [115.60866817774641]
Multimodal Large Language models (MLLMs) have shown promise in web-related tasks.
evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks.
bench is a multimodal benchmark designed to assess the capabilities of MLLMs across a variety of web tasks.
arXiv Detail & Related papers (2024-04-09T02:29:39Z) - CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs [7.888131064071474]
Current customer service models have limited integration with customer profiles.
Existing API integrations emphasize diversity over the precision and error avoidance essential in real-world customer service scenarios.
arXiv Detail & Related papers (2024-03-31T07:11:48Z) - RouterBench: A Benchmark for Multi-LLM Routing System [25.515453832224804]
No single model can optimally address all tasks and applications, particularly when balancing performance with cost.
This limitation has led to the development of LLM routing systems, which combine the strengths of various models to overcome the constraints of individual LLMs.
We present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems.
arXiv Detail & Related papers (2024-03-18T17:59:04Z) - Fairness in Serving Large Language Models [45.81800239353461]
This paper introduces the definition of serving fairness based on a cost function that accounts for the number of input and output tokens processed.
We propose a novel scheduling algorithm, the Virtual Counter Token (VTC), a fair difference between two backlogged clients.
We prove a 2x tight upper bound on the service scheduler, adhering to the requirement of work-conserving.
arXiv Detail & Related papers (2023-12-31T21:15:54Z) - MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria [49.500322937449326]
Multimodal large language models (MLLMs) have broadened the scope of AI applications.
Existing automatic evaluation methodologies for MLLMs are mainly limited in evaluating queries without considering user experiences.
We propose a new evaluation paradigm for MLLMs, which is evaluating MLLMs with per-sample criteria using potent MLLM as the judge.
arXiv Detail & Related papers (2023-11-23T12:04:25Z) - LiFT: A Scalable Framework for Measuring Fairness in ML Applications [18.54302159142362]
We present the LinkedIn Fairness Toolkit (LiFT), a framework for scalable computation of fairness metrics as part of large ML systems.
We discuss the challenges encountered in incorporating fairness tools in practice and the lessons learned during deployment at LinkedIn.
arXiv Detail & Related papers (2020-08-14T03:55:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.