Related papers: Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

URL: http://arxiv.org/abs/2212.11709v1
Date: Thu, 22 Dec 2022 13:52:53 GMT
Title: Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems
Authors: Shakthi Weerasinghe, Arkady Zaslavsky, Seng W. Loke, Amin Abken, Alireza Hassani
Abstract summary: Performance metrics-driven context caching has a profound impact on throughput and response time in distributed context management systems. This paper proposes a reinforcement learning based approach to adaptively cache context. Our novel algorithms enable context queries and sub-queries to reuse and repurpose cached context in an efficient manner.
Score: 0.7559720049837457
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Performance metrics-driven context caching has a profound impact on throughput and response time in distributed context management systems for real-time context queries. This paper proposes a reinforcement learning based approach to adaptively cache context with the objective of minimizing the cost incurred by context management systems in responding to context queries. Our novel algorithms enable context queries and sub-queries to reuse and repurpose cached context in an efficient manner. This approach is distinctive to traditional data caching approaches by three main features. First, we make selective context cache admissions using no prior knowledge of the context, or the context query load. Secondly, we develop and incorporate innovative heuristic models to calculate expected performance of caching an item when making the decisions. Thirdly, our strategy defines a time-aware continuous cache action space. We present two reinforcement learning agents, a value function estimating actor-critic agent and a policy search agent using deep deterministic policy gradient method. The paper also proposes adaptive policies such as eviction and cache memory scaling to complement our objective. Our method is evaluated using a synthetically generated load of context sub-queries and a synthetic data set inspired from real world data and query samples. We further investigate optimal adaptive caching configurations under different settings. This paper presents, compares, and discusses our findings that the proposed selective caching methods reach short- and long-term cost- and performance-efficiency. The paper demonstrates that the proposed methods outperform other modes of context management such as redirector mode, and database mode, and cache all policy by up to 60% in cost efficiency.

Related papers

Learning to Route: A Rule-Driven Agent Framework for Hybrid-Source Retrieval-Augmented Generation [55.47971671635531]
Large Language Models (LLMs) have shown remarkable performance on general Question Answering (QA)<n>Retrieval-Augmented Generation (RAG) addresses this limitation by enriching LLMs with external knowledge.<n>Existing systems primarily rely on unstructured documents, while largely overlooking relational databases.
arXiv Detail & Related papers (2025-09-30T22:19:44Z)
Semantic Caching for Low-Cost LLM Serving: From Offline Learning to Online Adaptation [54.61034867177997]
Caching inference responses allows them to be retrieved without another forward pass through the Large Language Models.<n>Traditional exact-match caching overlooks the semantic similarity between queries, leading to unnecessary recomputation.<n>We present a principled, learning-based framework for semantic cache eviction under unknown query and cost distributions.
arXiv Detail & Related papers (2025-08-11T06:53:27Z)
An Ensemble Embedding Approach for Improving Semantic Caching Performance in LLM-based Systems [4.364576564103288]
This paper presents an ensemble embedding approach that combines multiple embedding models through a trained meta-encoder to improve semantic similarity detection.<n>We evaluate our method using the Quora Question Pairs dataset, measuring cache hit ratios, cache miss ratios, token savings, and response times.
arXiv Detail & Related papers (2025-07-08T09:20:12Z)
ContextCache: Context-Aware Semantic Cache for Multi-Turn Queries in Large Language Models [33.729482204460815]
This demonstration introduces ContextCache, a context-aware semantic caching system for multi-turn dialogues.<n> ContextCache employs a two-stage retrieval architecture that first executes vector-based retrieval on the current query to identify potential matches and then integrates current and historical dialogue representations through self-attention mechanisms for precise contextual matching.<n> cached responses exhibit approximately 10 times lower latency than direct LLM invocation, enabling significant computational cost reductions for conversational applications.
arXiv Detail & Related papers (2025-06-28T07:25:12Z)
Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z)
Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models [11.012474205717178]
Large Language Models (LLMs) are increasingly deployed across edge and cloud platforms for real-time question-answering and retrieval-augmented generation.<n>This paper introduces a novel semantic caching approach for storing and reusing contextual summaries.<n>Our method reduces redundant computations by up to 50-60% while maintaining answer accuracy comparable to full document processing.
arXiv Detail & Related papers (2025-05-16T14:04:31Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
Context-DPO: Aligning Language Models for Context-Faithfulness [80.62221491884353]
We propose the first alignment method specifically designed to enhance large language models' context-faithfulness. By leveraging faithful and stubborn responses to questions with provided context from ConFiQA, our Context-DPO aligns LLMs through direct preference optimization. Extensive experiments demonstrate that our Context-DPO significantly improves context-faithfulness, achieving 35% to 280% improvements on popular open-source models.
arXiv Detail & Related papers (2024-12-18T04:08:18Z)
Edge Caching Optimization with PPO and Transfer Learning for Dynamic Environments [3.720975664058743]
In dynamic environments, changes in content popularity and variations in request rates frequently occur, making previously learned policies less effective as they were optimized for earlier conditions. We develop a mechanism that detects changes in content popularity and request rates, ensuring timely adjustments to the caching strategy. We also propose a transfer learning-based PPO algorithm that accelerates convergence in new environments by leveraging prior knowledge.
arXiv Detail & Related papers (2024-11-14T21:01:29Z)
Improving Retrieval in Sponsored Search by Leveraging Query Context Signals [6.152499434499752]
We propose an approach to enhance query understanding by augmenting queries with rich contextual signals. We use web search titles and snippets to ground queries in real-world information and utilize GPT-4 to generate query rewrites and explanations. Our context-aware approach substantially outperforms context-free models.
arXiv Detail & Related papers (2024-07-19T14:28:53Z)
Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL) GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval. Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z)
Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance. Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z)
Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching [4.2579244769567675]
We introduce a Proximal Policy Optimization (PPO)-based caching strategy that fully considers file attributes like lifetime, size, and priority. Our method outperforms a recent Deep Reinforcement Learning-based technique.
arXiv Detail & Related papers (2024-02-08T17:17:46Z)
From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms [2.9603743540540357]
We study how the relevance and quantity of past data affects the performance of a data-driven policy. We consider a setting in which past demands observed under close by'' contexts come from close by distributions.
arXiv Detail & Related papers (2023-02-16T17:03:39Z)
Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them. We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model. Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z)
From Traditional Adaptive Data Caching to Adaptive Context Caching: A Survey [0.7046417074932255]
One of the challenges is improving performance when responding to large number of context queries. Although caching is a proven way to improve transiency of context and features such as variability, heterogeneity of context queries pose an additional real-time cost management problem. This paper presents a critical survey of state-of-the-art in adaptive data caching with the objective of developing a body of knowledge in cost- and performance-efficient adaptive caching strategies.
arXiv Detail & Related papers (2022-11-21T08:47:51Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)
Reinforcement Learning for Caching with Space-Time Popularity Dynamics [61.55827760294755]
caching is envisioned to play a critical role in next-generation networks. To intelligently prefetch and store contents, a cache node should be able to learn what and when to cache. This chapter presents a versatile reinforcement learning based approach for near-optimal caching policy design.
arXiv Detail & Related papers (2020-05-19T01:23:51Z)
Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems. In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.