Related papers: SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses

SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses

URL: http://arxiv.org/abs/2601.01888v1
Date: Mon, 05 Jan 2026 08:29:51 GMT
Title: SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses
Authors: Yifan Wu, Yuhan Li, Zhenhua Wang, Zhongle Xie, Dingyu Yang, Ke Chen, Lidan Shou, Bo Tang, Liang Lin, Huan Li, Gang Chen,
Abstract summary: Memory overload is a common form of resource exhaustion in cloud data warehouses.<n>We propose SafeLoad, the first query admission control framework specifically designed to identify memory-overloading (MO) queries.<n>We show that SafeLoad achieves state-of-the-art prediction performance with low online and offline time overhead.
Score: 59.68732483257323
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Memory overload is a common form of resource exhaustion in cloud data warehouses. When database queries fail due to memory overload, it not only wastes critical resources such as CPU time but also disrupts the execution of core business processes, as memory-overloading (MO) queries are typically part of complex workflows. If such queries are identified in advance and scheduled to memory-rich serverless clusters, it can prevent resource wastage and query execution failure. Therefore, cloud data warehouses desire an admission control framework with high prediction precision, interpretability, efficiency, and adaptability to effectively identify MO queries. However, existing admission control frameworks primarily focus on scenarios like SLA satisfaction and resource isolation, with limited precision in identifying MO queries. Moreover, there is a lack of publicly available MO-labeled datasets with workloads for training and benchmarking. To tackle these challenges, we propose SafeLoad, the first query admission control framework specifically designed to identify MO queries. Alongside, we release SafeBench, an open-source, industrial-scale benchmark for this task, which includes 150 million real queries. SafeLoad first filters out memory-safe queries using the interpretable discriminative rule. It then applies a hybrid architecture that integrates both a global model and cluster-level models, supplemented by a misprediction correction module to identify MO queries. Additionally, a self-tuning quota management mechanism dynamically adjusts prediction quotas per cluster to improve precision. Experimental results show that SafeLoad achieves state-of-the-art prediction performance with low online and offline time overhead. Specifically, SafeLoad improves precision by up to 66% over the best baseline and reduces wasted CPU time by up to 8.09x compared to scenarios without SafeLoad.

Related papers

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning [78.46301394559903]
Large Language Models (LLMs) are increasingly used for long-duration tasks.<n>Current methods face a trade-off between cost and accuracy.<n>MemSifter is a novel framework that offloads the memory retrieval process to a small-scale proxy model.
arXiv Detail & Related papers (2026-03-03T02:57:38Z)
AMA: Adaptive Memory via Multi-Agent Collaboration [54.490349689939166]
We propose Adaptive Memory via Multi-Agent Collaboration (AMA), a novel framework that leverages coordinated agents to manage memory across multiple granularities.<n>AMA significantly outperforms state-of-the-art baselines while reducing token consumption by approximately 80% compared to full-context methods.
arXiv Detail & Related papers (2026-01-28T08:09:49Z)
MALLOC: Benchmarking the Memory-aware Long Sequence Compression for Large Sequential Recommendation [84.53415999381203]
MALLOC is a benchmark for memory-aware long sequence compression.<n>It is integrated into state-of-the-art recommenders, enabling a reproducible and evaluation platform.
arXiv Detail & Related papers (2026-01-28T04:11:50Z)
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning [73.27233666920618]
We propose MemSearcher, an agent workflow that iteratively maintains a compact memory and combines the current turn with it.<n>At each turn, MemSearcher fuses the user's question with the memory to generate reasoning traces, perform search actions, and update memory to retain only information essential for solving the task.<n>We introduce multi-context GRPO, an end-to-end RL framework that jointly optimize reasoning, search strategies, and memory management of MemSearcher Agents.
arXiv Detail & Related papers (2025-11-04T18:27:39Z)
StorageXTuner: An LLM Agent-Driven Automatic Tuning Framework for Heterogeneous Storage Systems [9.148071923560414]
Heuristic and ML tuning are often system specific, require manual glue, and degrade under changes.<n>Recent LLM-based approaches help but usually treat tuning as a single-shot, system-specific task.<n>We present StorageXTuner, an LLM agent-driven auto-tuning framework for heterogeneous storage engines.<n>We implement a prototype and evaluate it on RocksDB, LevelDB, CacheLib, and InnoDB with YCSB, MixGraph, and TPC-H/C.
arXiv Detail & Related papers (2025-10-28T22:33:14Z)
Accelerating LLM Inference with Precomputed Query Storage [0.13048920509133805]
StorInfer is a storage-assisted large language model (LLM) inference system.<n>When a user query semantically matches a precomputed query, StorInfer bypasses expensive GPU inference and instantly returns the stored response.
arXiv Detail & Related papers (2025-09-30T08:14:04Z)
SEDM: Scalable Self-Evolving Distributed Memory for Agents [23.182291416527764]
SEDM is a verifiable and adaptive framework that transforms memory from a passive repository into an active, self-optimizing component.<n>We show that SEDM improves reasoning accuracy while reducing token overhead compared with strong memory baselines.<n>Results highlight SEDM as a scalable and sustainable memory mechanism for open-ended multi-agent collaboration.
arXiv Detail & Related papers (2025-09-11T14:37:37Z)
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning [89.55738101744657]
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless.<n>We present Memory-R1, a reinforcement learning framework that equips LLMs with the ability to actively manage and utilize external memory.
arXiv Detail & Related papers (2025-08-27T12:26:55Z)
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents [52.92354372596197]
Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities.<n>This interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent's behavior.<n>We propose a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control and data-level constraints.
arXiv Detail & Related papers (2025-06-13T05:01:09Z)
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance.<n>We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z)
Leveraging Approximate Caching for Faster Retrieval-Augmented Generation [6.674782158041247]
We introduce Proximity, an approximate key-value cache that optimize the RAG workflow by leveraging similarities in user queries.<n>Instead of treating each query independently, Proximity reuses previously retrieved documents when similar queries appear.<n>Our experiments demonstrate that Proximity with our LSH scheme and a realistically-skewed MedRAG workload reduces database calls by 77.2% while maintaining database recall and test accuracy.
arXiv Detail & Related papers (2025-03-07T15:54:04Z)
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models [51.20476412037321]
We propose Safe LoRA, a simple one-liner patch to the original LoRA implementation by introducing the projection of LoRA weights from selected layers to the safety-aligned subspace.<n>Our experiments demonstrate that when fine-tuning on purely malicious data, Safe LoRA retains similar safety performance as the original aligned model.
arXiv Detail & Related papers (2024-05-27T05:04:05Z)
LearnedWMP: Workload Memory Prediction Using Distribution of Query Templates [2.803890673782225]
We propose Learned Workload Memory Prediction (LearnedWMP) to improve and simplify estimating the working memory demands of workloads. We show that LearnedWMP reduces the memory estimation error of the state-of-the-practice method by up to 47.6%.
arXiv Detail & Related papers (2024-01-22T16:38:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.