Related papers: Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?

Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?

URL: http://arxiv.org/abs/2504.15021v1
Date: Mon, 21 Apr 2025 11:09:43 GMT
Title: Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?
Authors: Xinglei Dou, Lei Liu, Limin Xiao,
Abstract summary: OSML+ is a new ML-based resource scheduling mechanism for co-located cloud services.<n>We show our design can work well across various cloud servers, including the latest off-the-shelf large-scale servers.
Score: 4.546118183880352
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Making it intelligent is a promising way in System/OS design. This paper proposes OSML+, a new ML-based resource scheduling mechanism for co-located cloud services. OSML+ intelligently schedules the cache and main memory bandwidth resources at the memory hierarchy and the computing core resources simultaneously. OSML+ uses a multi-model collaborative learning approach during its scheduling and thus can handle complicated cases, e.g., avoiding resource cliffs, sharing resources among applications, enabling different scheduling policies for applications with different priorities, etc. OSML+ can converge faster using ML models than previous studies. Moreover, OSML+ can automatically learn on the fly and handle dynamically changing workloads accordingly. Using transfer learning technologies, we show our design can work well across various cloud servers, including the latest off-the-shelf large-scale servers. Our experimental results show that OSML+ supports higher loads and meets QoS targets with lower overheads than previous studies.

Related papers

Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings [25.184186431458862]
Large language models (LLMs) can exhibit multi-task-solving capabilities through multiple dialogues and multi-modal data sources.<n>These unique characteristics of LLMs, together with their large model size, make their deployment more challenging.<n>We design TMO, a local-cloud LLM inference system with Three-M Offloading: Multi-modal, Multi-task, and Multi-dialogue.
arXiv Detail & Related papers (2025-02-16T06:18:28Z)
Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests. We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments. We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z)
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM [89.50691075011429]
Slow-thinking reasoning systems have garnered widespread attention by scaling the thinking time during inference. There is also growing interest in adapting this capability to multimodal large language models (MLLMs) In this paper, we explore a straightforward approach by fine-tuning a capable MLLM with a small amount of textual long-form thought data. We find that these long-form reasoning processes, expressed in natural language, can be effectively transferred to MLLMs.
arXiv Detail & Related papers (2025-01-03T17:14:16Z)
Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs [3.7758841366694353]
We survey scheduling techniques from the literature and from practical serving systems.<n>We find that schedulers from the literature often achieve good performance but introduce significant complexity.<n>In contrast, schedulers in practical deployments often leave easy performance gains on the table but are easy to implement, deploy and configure.
arXiv Detail & Related papers (2024-10-23T13:05:46Z)
Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting. We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z)
Towards Modular LLMs by Building and Reusing a Library of LoRAs [64.43376695346538]
We study how to best build a library of adapters given multi-task data. We introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters.
arXiv Detail & Related papers (2024-05-18T03:02:23Z)
Efficient Multimodal Large Language Models: A Survey [60.7614299984182]
Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. The extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. This survey provides a comprehensive and systematic review of the current state of efficient MLLMs.
arXiv Detail & Related papers (2024-05-17T12:37:10Z)
Preble: Efficient Distributed Prompt Scheduling for LLM Serving [8.706905652975554]
This paper proposes Preble, the first distributed LLM serving platform that targets and optimize for prompt sharing. We designed a distributed scheduling system that co-optimizes KV state reuse and computation load-balancing with a new scheduling algorithm and a hierarchical scheduling mechanism. Our evaluation of Preble with real workloads and request arrival patterns on two open-source LLMs shows that Preble outperforms the SOTA serving systems by 1.5X to 14.5X on average latency and 2X to 10X on p99 latency.
arXiv Detail & Related papers (2024-05-08T06:30:58Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
Venn: Resource Management for Collaborative Learning Jobs [24.596584073531886]
Collaborative learning (CL) has emerged as a promising approach for machine learning (ML) and data science across distributed edge devices. In this paper, we present Venn, a CL resource manager that efficiently schedules heterogeneous devices among multiple CL jobs. Our evaluation shows that, compared to the state-of-the-art CL resource managers, Venn improves the average JCT by up to 1.88x.
arXiv Detail & Related papers (2023-12-13T17:13:08Z)
In Situ Framework for Coupling Simulation and Machine Learning with Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations. As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks. This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z)
Machine learning for cloud resources management -- An overview [0.0]
This study explores the most important cloud resources management issues that have been combined with Machine Learning. A big collection of researches is used to make sensible comparisons between the ML techniques that are used in the different kinds of cloud resources management fields. We propose the most suitable ML model for each field.
arXiv Detail & Related papers (2021-01-28T13:23:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.