Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2511.03279v1
- Date: Wed, 05 Nov 2025 08:22:42 GMT
- Title: Multi-Objective Adaptive Rate Limiting in Microservices Using Deep Reinforcement Learning
- Authors: Ning Lyu, Yuxi Wang, Ziyu Cheng, Qingyuan Zhang, Feng Chen,
- Abstract summary: API rate limiting has emerged as a critical mechanism for ensuring system stability and service quality.<n>Traditional rate limiting algorithms, such as token bucket and sliding window, struggle to adapt to dynamic traffic patterns and varying system loads.<n>This paper proposes an adaptive rate limiting strategy based on deep reinforcement learning that dynamically balances system throughput and service latency.
- Score: 10.766410192517164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As cloud computing and microservice architectures become increasingly prevalent, API rate limiting has emerged as a critical mechanism for ensuring system stability and service quality. Traditional rate limiting algorithms, such as token bucket and sliding window, while widely adopted, struggle to adapt to dynamic traffic patterns and varying system loads. This paper proposes an adaptive rate limiting strategy based on deep reinforcement learning that dynamically balances system throughput and service latency. We design a hybrid architecture combining Deep Q-Network (DQN) and Asynchronous Advantage Actor-Critic (A3C) algorithms, modeling the rate limiting decision process as a Markov Decision Process. The system continuously monitors microservice states and learns optimal rate limiting policies through environmental interaction. Extensive experiments conducted in a Kubernetes cluster environment demonstrate that our approach achieves 23.7% throughput improvement and 31.4% P99 latency reduction compared to traditional fixed-threshold strategies under high-load scenarios. Results from a 90-day production deployment handling 500 million daily requests validate the practical effectiveness of the proposed method, with 82% reduction in service degradation incidents and 68% decrease in manual interventions.
Related papers
- OptiQKD: A Machine Learning-Optimized Framework for Real-Time Parameter Tuning in Quantum Key Distribution [0.0]
We propose OptiQKD, a protocol-agnostic machine learning framework specifically engineered to maximize the Secure Key Rate (SKR) and minimize the Quantum Bit Error Rate (QBER) for the BB84, E91, and COW protocols.<n>We evaluate the framework by simulating critical environmental stressors, including depolarizing and amplitude-damping noise, under realistic device constraints.
arXiv Detail & Related papers (2026-03-04T15:43:31Z) - Bandwidth-adaptive Cloud-Assisted 360-Degree 3D Perception for Autonomous Vehicles [0.7557499794873328]
Key challenge for autonomous driving is maintaining real-time situational awareness regarding surrounding obstacles under strict latency constraints.<n>We propose leveraging Vehicle-to-Everything (V2X) communication to partially offload processing to the cloud.<n>Our approach utilizes transformer-based models to fuse multi-camera sensor data into a comprehensive Bird's-Eye View (BEV) representation.
arXiv Detail & Related papers (2026-02-27T10:12:02Z) - CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems [62.24576366776727]
We propose a latency-aware scheduling framework to minimize total inference latency.<n>We show that the proposed method significantly reduces cold-start latency compared to baseline strategies.
arXiv Detail & Related papers (2025-08-15T07:49:22Z) - Scalability Optimization in Cloud-Based AI Inference Services: Strategies for Real-Time Load Balancing and Automated Scaling [1.3689475854650441]
This study proposes a comprehensive scalability optimization framework for cloud AI inference services.<n>The proposed model is a hybrid approach that combines reinforcement learning for adaptive load distribution and deep neural networks for accurate demand forecasting.<n> Experimental results demonstrate that the proposed model enhances load balancing efficiency by 35 and reduces response delay by 28.
arXiv Detail & Related papers (2025-04-16T04:00:04Z) - Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems [9.820223170841219]
Service Level Objectives (SLOs) in large-scale architectures are challenging due to their heterogeneous nature and varying service requirements.<n>We present a benchmark of Active Inference -- an emerging method from neuroscience -- against three established reinforcement learning algorithms.<n>We find that Active Inference is a promising approach for ensuring SLO compliance in DCCS, offering lower memory usage, stable CPU utilization, and fast convergence.
arXiv Detail & Related papers (2025-03-05T08:56:26Z) - SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning [51.10866035483686]
High update-to-data (UTD) ratio algorithms in reinforcement learning (RL) improve sample efficiency but incur high computational costs, limiting real-world scalability.<n>We propose Offline Stabilization Phases for Efficient Q-Learning (SPEQ), an RL algorithm that combines low-UTD online training with periodic offline stabilization phases.<n>During these phases, Q-functions are fine-tuned with high UTD ratios on a fixed replay buffer, reducing redundant updates on suboptimal data.
arXiv Detail & Related papers (2025-01-15T09:04:19Z) - Accelerating Energy-Efficient Federated Learning in Cell-Free Networks with Adaptive Quantization [45.99908087352264]
Federated Learning (FL) enables clients to share learning parameters instead of local data, reducing communication overhead.<n>Traditional wireless networks face latency challenges with FL.<n>We propose an energy-efficient, low-latency FL framework featuring optimized uplink power allocation for seamless client-server collaboration.
arXiv Detail & Related papers (2024-12-30T08:10:21Z) - Differentiable Discrete Event Simulation for Queuing Network Control [7.965453961211742]
Queueing network control poses distinct challenges, including highity, large state and action spaces, and lack of stability.
We propose a scalable framework for policy optimization based on differentiable discrete event simulation.
Our methods can flexibly handle realistic scenarios, including systems operating in non-stationary environments.
arXiv Detail & Related papers (2024-09-05T17:53:54Z) - Neural Horizon Model Predictive Control -- Increasing Computational Efficiency with Neural Networks [0.0]
We propose a proposed machine-learning supported approach to model predictive control.
We propose approximating part of the problem horizon, while maintaining safety guarantees.
The proposed MPC scheme can be applied to a wide range of applications, including those requiring a rapid control response.
arXiv Detail & Related papers (2024-08-19T08:13:37Z) - FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning [57.38427653043984]
Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients.
We introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge.
We demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance.
arXiv Detail & Related papers (2024-05-20T06:12:33Z) - Networked Online Learning for Control of Safety-Critical
Resource-Constrained Systems based on Gaussian Processes [9.544146562919792]
We propose a novel networked online learning approach based on Gaussian process regression.
We propose an effective data transmission scheme between the local system and the cloud taking bandwidth limitations and time delay of the transmission channel into account.
arXiv Detail & Related papers (2022-02-23T13:12:12Z) - Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge
Caching [91.50631418179331]
A privacy-preserving distributed deep policy gradient (P2D3PG) is proposed to maximize the cache hit rates of devices in the MEC networks.
We convert the distributed optimizations into model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction.
arXiv Detail & Related papers (2021-10-20T02:48:27Z) - Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud
System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems.
First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch.
Second, for diverse system scales and structures, we use graph neural networks to embed system state information.
Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.