Related papers: Reclaimer: A Reinforcement Learning Approach to Dynamic Resource Allocation for Cloud Microservices

Related papers

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization [74.04867639197445]
MiroMind-M1 is a set of fully open-source RLMs built on the Qwen-2.5-based benchmarks.<n>Our models are trained in two stages: SFT on a carefully curated corpus of 719K math-reasoning problems with verified CoT trajectories, followed by RLVR on 62K challenging and verifiable problems.
arXiv Detail & Related papers (2025-07-19T16:21:23Z)
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers [58.98923344096319]
REFORM is a novel inference framework that efficiently handles long contexts through a two-phase approach.<n>It achieves over 50% and 27% performance gains on RULER and BABILong respectively at 1M context length.<n>It also outperforms baselines on Infinite-Bench and MM-NIAH, demonstrating flexibility across diverse tasks and domains.
arXiv Detail & Related papers (2025-06-01T23:49:14Z)
Learning Adaptive Parallel Reasoning with Language Models [70.1745752819628]
We propose Adaptive Parallel Reasoning (APR), a novel reasoning framework that enables language models to orchestrate both serialized and parallel computations end-to-end. APR generalizes existing reasoning methods by enabling adaptive multi-threaded inference using spawn() and join() operations. A key innovation is our end-to-end reinforcement learning strategy, optimizing both parent and child inference threads to enhance task success rate without requiring predefined reasoning structures.
arXiv Detail & Related papers (2025-04-21T22:29:02Z)
MONO2REST: Identifying and Exposing Microservices: a Reusable RESTification Approach [0.7499722271664147]
Many organizations are pursuing the migration of legacy monolithic systems to an architectural style. This process is challenging, risky, time-intensive, and prone to failure while several organizations lack necessary financial resources, time, or expertise to set up this migration process. We propose exposing a legacy system as a microservice application without having to migrate it.
arXiv Detail & Related papers (2025-03-27T14:10:33Z)
STaleX: A Spatiotemporal-Aware Adaptive Auto-scaling Framework for Microservices [3.0846824529023382]
This paper presents a combination of control theory, machine learning, andtemporals to address these challenges. We propose an adaptive auto-scaling framework, STXale, that integrates features, enabling real-time resource adjustments. Our framework accounts for features including service specifications and dependencies among services, as well as temporal variations in workload.
arXiv Detail & Related papers (2025-01-30T20:19:13Z)
FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing [59.12511498024836]
We present a method to prune large language models (LLMs) that selectively prunes model blocks based on an importance score. We propose a principled metric to replace each pruned block using a weight-sharing mechanism. Empirical evaluations demonstrate substantial performance gains over existing methods.
arXiv Detail & Related papers (2025-01-24T18:46:37Z)
Microservice Deployment in Space Computing Power Networks via Robust Reinforcement Learning [43.96374556275842]
It is important to provide reliable real-time remote sensing inference services to meet the low-latency requirements. This paper presents a remote sensing artificial intelligence applications deployment framework designed for Low Earth Orbit satellite constellations.
arXiv Detail & Related papers (2025-01-08T16:55:04Z)
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark [62.58869921806019]
We propose a task decomposition evaluation framework based on GPT-4o to automatically construct a new training dataset. We design innovative training strategies to effectively distill GPT-4o's evaluation capabilities into a 7B open-source MLLM, MiniCPM-V-2.6. Experimental results demonstrate that our distilled open-source MLLM significantly outperforms the current state-of-the-art GPT-4o-base baseline.
arXiv Detail & Related papers (2024-11-23T08:06:06Z)
Leveraging Interpretability in the Transformer to Automate the Proactive Scaling of Cloud Resources [1.1470070927586018]
We develop a model that captures the relationship between an end-to-end latency, requests at the front-end level, and resource utilization. We then use the developed model to predict the end-to-end latency. We demonstrate the merit of a microservice-based application and provide a roadmap to deployment.
arXiv Detail & Related papers (2024-09-04T22:03:07Z)
ThinK: Thinner Key Cache by Query-Driven Pruning [63.13363917871414]
Large Language Models (LLMs) have revolutionized the field of natural language processing, achieving unprecedented performance across a variety of applications. This paper focuses on the long-context scenario, addressing the inefficiencies in KV cache memory consumption during inference. We propose ThinK, a novel query-dependent KV cache pruning method designed to minimize attention weight loss while selectively pruning the least significant channels.
arXiv Detail & Related papers (2024-07-30T17:59:08Z)
Entanglement Distribution Delay Optimization in Quantum Networks with Distillation [51.53291671169632]
Quantum networks (QNs) distribute entangled states to enable distributed quantum computing and sensing applications. QS resource allocation framework is proposed to enhance the end-to-end (e2e) fidelity and satisfy minimum rate and fidelity requirements.
arXiv Detail & Related papers (2024-05-15T02:04:22Z)
Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems. We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z)
Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance. Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z)
DeepScaler: Holistic Autoscaling for Microservices Based on Spatiotemporal GNN with Adaptive Graph Learning [4.128665560397244]
This paper presents DeepScaler, a deep learning-based holistic autoscaling approach. It focuses on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency. Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice.
arXiv Detail & Related papers (2023-09-02T08:22:21Z)
Handling Communication via APIs for Microservices [6.5499625417846685]
We discuss the challenges with conventional techniques of communication using and propose an alternative way of ID-passing via APIs. We also devise an algorithm to reduce the number of APIs.
arXiv Detail & Related papers (2023-08-02T17:40:34Z)
Benchmarking scalability of stream processing frameworks deployed as microservices in the cloud [0.38073142980732994]
We benchmark five modern stream processing frameworks regarding their scalability using a systematic method. All benchmarked frameworks exhibit approximately linear scalability as long as sufficient cloud resources are provisioned. There is no clear superior framework, but the ranking of the frameworks on the use case.
arXiv Detail & Related papers (2023-03-20T13:22:03Z)
MicroRes: Versatile Resilience Profiling in Microservices via Degradation Dissemination Indexing [29.456286275972474]
Microservice resilience, the ability to recover from failures and continue providing reliable and responsive services, is crucial for cloud vendors. The current practice relies on manually configured specific rules to a certain microservice system, resulting in labor-intensity and flexibility issues. Our insight is that resilient deployment can effectively prevent the dissemination of degradation from system performance to user-aware metrics, and the latter affects service quality.
arXiv Detail & Related papers (2022-12-25T03:56:42Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.