Related papers: Dynamic Optimization of Storage Systems Using Reinforcement Learning Techniques

Related papers

Rethinking Autoregressive Models for Lossless Image Compression via Hierarchical Parallelism and Progressive Adaptation [75.58269386927076]
Autoregressive (AR) models are often dismissed as impractical due to prohibitive computational cost.<n>This work re-thinks this paradigm, introducing a framework built on hierarchical parallelism and progressive adaptation.<n> Experiments on diverse datasets (natural, satellite, medical) validate that our method achieves new state-of-the-art compression.
arXiv Detail & Related papers (2025-11-14T06:27:58Z)
Intent-Driven Storage Systems: From Low-Level Tuning to High-Level Understanding [9.203282718014021]
Existing storage systems lack visibility into workload intent, limiting their ability to adapt to modern, large-scale applications.<n>We propose Intent-Driven Storage Systems (IDSS), a vision for a new paradigm where large language models (LLMs) infer workload and system intent from unstructured signals.<n>IDSS provides holistic reasoning for competing demands, synthesizing safe and efficient decisions within policy guardrails.
arXiv Detail & Related papers (2025-09-29T12:07:40Z)
DAF: An Efficient End-to-End Dynamic Activation Framework for on-Device DNN Training [41.09085549544767]
We introduce a Dynamic Activation Framework (DAF) that enables scalable and efficient on-device training through system-level optimizations.<n>DAF achieves both memory- and time-efficient dynamic quantization training by addressing key system bottlenecks.<n> Evaluations on various deep learning models across embedded and mobile platforms demonstrate up to a $22.9times$ reduction in memory usage and a $3.2times$ speedup.
arXiv Detail & Related papers (2025-07-09T08:59:30Z)
MemOS: A Memory OS for AI System [116.87568350346537]
Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI)<n>Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.<n>MemOS is a memory operating system that treats memory as a manageable system resource.
arXiv Detail & Related papers (2025-07-04T17:21:46Z)
Compress, Gather, and Recompute: REFORMing Long-Context Processing in Transformers [58.98923344096319]
REFORM is a novel inference framework that efficiently handles long contexts through a two-phase approach.<n>It achieves over 50% and 27% performance gains on RULER and BABILong respectively at 1M context length.<n>It also outperforms baselines on Infinite-Bench and MM-NIAH, demonstrating flexibility across diverse tasks and domains.
arXiv Detail & Related papers (2025-06-01T23:49:14Z)
Towards Efficient Key-Value Cache Management for Prefix Prefilling in LLM Inference [10.499422091699918]
Inference workloads exhibit high cache reusability, making efficient caching critical to reducing redundancy and improving speed.<n>We analyze real-world KVC access patterns using publicly available traces and evaluate commercial key-value stores like Redis and state-of-the-art RDMA-based systems for KVC metadata management.
arXiv Detail & Related papers (2025-05-28T03:05:55Z)
Self-Balancing, Memory Efficient, Dynamic Metric Space Data Maintenance, for Rapid Multi-Kernel Estimation [2.6756996523251964]
We present a dynamic self-balancing octree data structure that enables efficient neighborhood maintenance in evolving metric spaces. Our approach yields exponential speedups while preserving accuracy, particularly in high-dimensional spaces.
arXiv Detail & Related papers (2025-04-25T01:15:53Z)
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding [0.0]
We present a framework for enhancing Retrieval-Augmented Generation (RAG) systems through dynamic retrieval strategies and reinforcement fine-tuning. Our framework integrates two complementary techniques: Policy-d RetrievalAugmented Generation (PORAG) and Adaptive Token-Layer Attention Scoring (ATLAS) Our framework reduces hallucinations, strengthens domain-specific reasoning, and achieves significant efficiency and scalability gains over traditional RAG systems.
arXiv Detail & Related papers (2025-04-02T01:16:10Z)
Dynamic Adaptation in Data Storage: Real-Time Machine Learning for Enhanced Prefetching [40.13303683102544]
This study explores the application of streaming machine learning to revolutionize data prefetching within multi-tiered storage systems. Unlike traditional batch-trained models, streaming machine learning offers adaptability, real-time insights, and computational efficiency.
arXiv Detail & Related papers (2024-12-29T17:39:37Z)
Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning [64.93848182403116]
Current deep-learning memory models struggle in reinforcement learning environments that are partially observable and long-term. We introduce the Stable Hadamard Memory, a novel memory model for reinforcement learning agents. Our approach significantly outperforms state-of-the-art memory-based methods on challenging partially observable benchmarks.
arXiv Detail & Related papers (2024-10-14T03:50:17Z)
Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks [60.54852710216738]
We introduce a novel digital twin-assisted optimization framework, called D-REC, to ensure reliable caching in nextG wireless networks. By incorporating reliability modules into a constrained decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints.
arXiv Detail & Related papers (2024-06-29T02:40:28Z)
Training-Free Exponential Context Extension via Cascading KV Cache [49.608367376911694]
We introduce a novel mechanism that leverages cascading sub-cache buffers to selectively retain the most relevant tokens. Our method reduces prefill stage latency by a factor of 6.8 when compared to flash attention on 1M tokens.
arXiv Detail & Related papers (2024-06-24T03:59:17Z)
Memory-Efficient Optimization with Factorized Hamiltonian Descent [11.01832755213396]
We introduce a novel adaptive, H-Fac, which incorporates a memory-efficient factorization approach to address this challenge. By employing a rank-1 parameterization for both momentum and scaling parameter estimators, H-Fac reduces memory costs to a sublinear level. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings in optimization dynamics and convergence guarantees.
arXiv Detail & Related papers (2024-06-14T12:05:17Z)
Bullion: A Column Store for Machine Learning [4.096087402737292]
This paper presents Bullion, a columnar storage system tailored for machine learning workloads. Bundy addresses the complexities of data compliance, optimize the encoding of long sequence sparse features, efficiently manages wide-table projections, introduces feature quantization in storage, and provides a comprehensive cascading encoding framework. Preliminary experimental results and theoretical analysis demonstrate Bullion's improved ability to deliver strong performance in the face of the unique demands of machine learning workloads.
arXiv Detail & Related papers (2024-04-13T05:01:54Z)
TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework [58.474610046294856]
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime. This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions.
arXiv Detail & Related papers (2023-09-29T02:27:54Z)
R^3: On-device Real-Time Deep Reinforcement Learning for Autonomous Robotics [9.2327813168753]
This paper presents R3, a holistic solution for managing timing, memory, and algorithm performance in on-device real-time DRL training. R3 employs (i) a deadline-driven feedback loop with dynamic batch sizing for optimizing timing, (ii) efficient memory management to reduce memory footprint and allow larger replay buffer sizes, and (iii) a runtime coordinator guided by runtime analysis and a runtime profiler for adjusting memory resource reservations.
arXiv Detail & Related papers (2023-08-29T05:48:28Z)
Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled. This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit. We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.