Optimizing L1 cache for embedded systems through grammatical evolution
- URL: http://arxiv.org/abs/2303.03338v1
- Date: Mon, 6 Mar 2023 18:10:00 GMT
- Title: Optimizing L1 cache for embedded systems through grammatical evolution
- Authors: Josefa D\'iaz \'Alvarez, J. Manuel Colmenar, Jos\'e L. Risco-Mart\'in,
Juan Lanchares and Oscar Garnica
- Abstract summary: Grammatical Evolution (GE) is able to efficiently find the best cache configurations for a given set of benchmark applications.
Our proposal is able to find cache configurations that obtain an average improvement of $62%$ versus a real world baseline configuration.
- Score: 1.9371782627708491
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Nowadays, embedded systems are provided with cache memories that are large
enough to influence in both performance and energy consumption as never
occurred before in this kind of systems. In addition, the cache memory system
has been identified as a component that improves those metrics by adapting its
configuration according to the memory access patterns of the applications being
run. However, given that cache memories have many parameters which may be set
to a high number of different values, designers face to a wide and
time-consuming exploration space. In this paper we propose an optimization
framework based on Grammatical Evolution (GE) which is able to efficiently find
the best cache configurations for a given set of benchmark applications. This
metaheuristic allows an important reduction of the optimization runtime
obtaining good results in a low number of generations. Besides, this reduction
is also increased due to the efficient storage of evaluated caches. Moreover,
we selected GE because the plasticity of the grammar eases the creation of
phenotypes that form the call to the cache simulator required for the
evaluation of the different configurations. Experimental results for the
Mediabench suite show that our proposal is able to find cache configurations
that obtain an average improvement of $62\%$ versus a real world baseline
configuration.
Related papers
- Dynamic Optimization of Storage Systems Using Reinforcement Learning Techniques [40.13303683102544]
This paper introduces RL-Storage, a reinforcement learning-based framework designed to dynamically optimize storage system configurations.
RL-Storage learns from real-time I/O patterns and predicts optimal storage parameters, such as cache size, queue depths, and readahead settings.
It achieves throughput gains of up to 2.6x and latency reductions of 43% compared to baselines.
arXiv Detail & Related papers (2024-12-29T17:41:40Z) - XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference [9.65524177141491]
Large Language Model (LLM) inference generates output tokens one-by-one, leading to many redundant computations.
KV-Cache framework makes a compromise between time and space complexities.
Existing studies reduce memory consumption by evicting some of cached data that have less important impact on inference accuracy.
We show that customizing the cache size for each layer in a personalized manner can yield a significant memory reduction.
arXiv Detail & Related papers (2024-12-08T11:32:08Z) - PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation [65.36715026409873]
Key-value (KV) cache, necessitated by the lengthy input and output sequences, notably contributes to the high inference cost.
We present PrefixKV, which reframes the challenge of determining KV cache sizes for all layers into the task of searching for the optimal global prefix configuration.
Our method achieves the state-of-the-art performance compared with others.
arXiv Detail & Related papers (2024-12-04T15:48:59Z) - Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models.
We propose an importance-driven cache merging strategy to prune redundancy caches.
For instruction encoding, we utilize the frequency to evaluate the importance of caches.
Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z) - CORM: Cache Optimization with Recent Message for Large Language Model Inference [57.109354287786154]
We introduce an innovative method for optimizing the KV cache, which considerably minimizes its memory footprint.
CORM, a KV cache eviction policy, dynamically retains essential key-value pairs for inference without the need for model fine-tuning.
Our validation shows that CORM reduces the inference memory usage of KV cache by up to 70% with negligible performance degradation across six tasks in LongBench.
arXiv Detail & Related papers (2024-04-24T16:11:54Z) - RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation [11.321659218769598]
Retrieval-Augmented Generation (RAG) has shown significant improvements in various natural language processing tasks.
RAGCache organizes the intermediate states of retrieved knowledge in a knowledge tree and caches them in the GPU and host memory hierarchy.
RAGCache reduces the time to first token (TTTF) by up to 4x and improves the throughput by up to 2.1x compared to vLLM integrated with Faiss.
arXiv Detail & Related papers (2024-04-18T18:32:30Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Evolutionary Design of the Memory Subsystem [2.378428291297535]
We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology.
To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools.
We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.
arXiv Detail & Related papers (2023-03-07T10:45:51Z) - Multi-objective optimization of energy consumption and execution time in
a single level cache memory for embedded systems [2.378428291297535]
Multi-objective optimization may help to minimize both conflicting metrics in an independent manner.
Our design method reaches an average improvement of 64.43% and 91.69% in execution time and energy consumption.
arXiv Detail & Related papers (2023-02-22T09:35:03Z) - Accelerating Deep Learning Classification with Error-controlled
Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching.
While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error.
We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.