Related papers: Optimizing SSD Caches for Cloud Block Storage Systems Using Machine Learning Approaches

Optimizing SSD Caches for Cloud Block Storage Systems Using Machine Learning Approaches

URL: http://arxiv.org/abs/2501.14770v2
Date: Tue, 28 Jan 2025 20:35:23 GMT
Title: Optimizing SSD Caches for Cloud Block Storage Systems Using Machine Learning Approaches
Authors: Chiyu Cheng, Chang Zhou, Yang Zhao, Jin Cao,
Abstract summary: This paper proposes a novel approach to dynamically optimize the write policy in cloud-based storage systems.<n>The proposed method identifies write-only data and selectively filters it out in real-time, thereby minimizing the number of unnecessary write operations.
Score: 40.13303683102544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing demand for efficient cloud storage solutions has led to the widespread adoption of Solid-State Drives (SSDs) for caching in cloud block storage systems. The management of data writes to SSD caches plays a crucial role in improving overall system performance, reducing latency, and extending the lifespan of storage devices. A critical challenge arises from the large volume of write-only data, which significantly impacts the performance of SSD caches when handled inefficiently. Specifically, writes that have not been read for a certain period may introduce unnecessary write traffic to the SSD cache without offering substantial benefits for cache performance. This paper proposes a novel approach to mitigate this issue by leveraging machine learning techniques to dynamically optimize the write policy in cloud-based storage systems. The proposed method identifies write-only data and selectively filters it out in real-time, thereby minimizing the number of unnecessary write operations and improving the overall performance of the cache system. Experimental results demonstrate that the proposed machine learning-based policy significantly outperforms traditional approaches by reducing the number of harmful writes and optimizing cache utilization. This solution is particularly suitable for cloud environments with varying and unpredictable workloads, where traditional cache management strategies often fall short.

Related papers

Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction [58.044803442346115]
Diffusion Large Language Models (dLLMs) enable breakthroughs in reasoning and parallel decoding but suffer from prohibitive computational complexity and memory overhead during inference.<n>We propose Sparse-dLLM, the first training-free framework integrating dynamic cache eviction with sparse attention via delayed bidirectional sparse caching.
arXiv Detail & Related papers (2025-08-04T16:14:03Z)
Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference [10.499422091699918]
Inference workloads exhibit high cache reusability, making efficient caching critical to reducing redundancy and improving speed.<n>We analyze real-world KVC access patterns using publicly available traces and evaluate commercial key-value stores like Redis and state-of-the-art RDMA-based systems for KVC metadata management.
arXiv Detail & Related papers (2025-05-28T03:05:55Z)
Dynamic Optimization of Storage Systems Using Reinforcement Learning Techniques [40.13303683102544]
This paper introduces RL-Storage, a reinforcement learning-based framework designed to dynamically optimize storage system configurations.<n>RL-Storage learns from real-time I/O patterns and predicts optimal storage parameters, such as cache size, queue depths, and readahead settings.<n>It achieves throughput gains of up to 2.6x and latency reductions of 43% compared to baselines.
arXiv Detail & Related papers (2024-12-29T17:41:40Z)
CSR:Achieving 1 Bit Key-Value Cache via Sparse Representation [63.65323577445951]
We propose a novel approach called Cache Sparse Representation (CSR)<n>CSR transforms the dense Key-Value cache tensor into sparse indexes and weights, offering a more memory-efficient representation during LLM inference.<n>Our experiments demonstrate CSR achieves performance comparable to state-of-the-art KV cache quantization algorithms.
arXiv Detail & Related papers (2024-12-16T13:01:53Z)
InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference [10.115950753431528]
Large Language Models (LLMs) are a significant milestone in generative AI. The increasing context length and batch size in offline LLM inference escalates the memory requirement of the key-value (KV) cache. Several cost-effective solutions leverage host memory or optimized to reduce storage costs for offline inference scenarios. We propose InstInfer, which offloads the most performance-critical computation (i.e., attention in decoding phase) and data (i.e., KV cache) parts to Computational Storage Drives (CSDs) InstInfer improves throughput for long-sequence inference by
arXiv Detail & Related papers (2024-09-08T06:06:44Z)
ThinK: Thinner Key Cache by Query-Driven Pruning [63.13363917871414]
Large Language Models (LLMs) have revolutionized the field of natural language processing, achieving unprecedented performance across a variety of applications. This paper focuses on the long-context scenario, addressing the inefficiencies in KV cache memory consumption during inference. We propose ThinK, a novel query-dependent KV cache pruning method designed to minimize attention weight loss while selectively pruning the least significant channels.
arXiv Detail & Related papers (2024-07-30T17:59:08Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks [60.54852710216738]
We introduce a novel digital twin-assisted optimization framework, called D-REC, to ensure reliable caching in nextG wireless networks. By incorporating reliability modules into a constrained decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints.
arXiv Detail & Related papers (2024-06-29T02:40:28Z)
A Learning-based Approach Towards Automated Tuning of SSD Configurations [3.8975567119716805]
We present an automated learning-based framework, named LearnedSSD, for tuning of hardware configurations for solid-state drives (SSDs) LearnedSSD automatically extracts the unique access patterns of a new workload using its block I/O traces, maps the workload to previously workloads for utilizing the learned experiences, and recommends an optimal SSD configuration based on the validated storage performance. We develop LearnedSSD with simple yet effective learning algorithms that can run efficiently on multi-core CPUs.
arXiv Detail & Related papers (2021-10-17T00:25:21Z)
Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks. To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache. This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.