Optimizing L1 cache for embedded systems through grammatical evolution
- URL: http://arxiv.org/abs/2303.03338v1
- Date: Mon, 6 Mar 2023 18:10:00 GMT
- Title: Optimizing L1 cache for embedded systems through grammatical evolution
- Authors: Josefa D\'iaz \'Alvarez, J. Manuel Colmenar, Jos\'e L. Risco-Mart\'in,
Juan Lanchares and Oscar Garnica
- Abstract summary: Grammatical Evolution (GE) is able to efficiently find the best cache configurations for a given set of benchmark applications.
Our proposal is able to find cache configurations that obtain an average improvement of $62%$ versus a real world baseline configuration.
- Score: 1.9371782627708491
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Nowadays, embedded systems are provided with cache memories that are large
enough to influence in both performance and energy consumption as never
occurred before in this kind of systems. In addition, the cache memory system
has been identified as a component that improves those metrics by adapting its
configuration according to the memory access patterns of the applications being
run. However, given that cache memories have many parameters which may be set
to a high number of different values, designers face to a wide and
time-consuming exploration space. In this paper we propose an optimization
framework based on Grammatical Evolution (GE) which is able to efficiently find
the best cache configurations for a given set of benchmark applications. This
metaheuristic allows an important reduction of the optimization runtime
obtaining good results in a low number of generations. Besides, this reduction
is also increased due to the efficient storage of evaluated caches. Moreover,
we selected GE because the plasticity of the grammar eases the creation of
phenotypes that form the call to the cache simulator required for the
evaluation of the different configurations. Experimental results for the
Mediabench suite show that our proposal is able to find cache configurations
that obtain an average improvement of $62\%$ versus a real world baseline
configuration.
Related papers
- Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models.
We propose an importance-driven cache merging strategy to prune redundancy caches.
For instruction encoding, we utilize the frequency to evaluate the importance of caches.
Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z) - CORM: Cache Optimization with Recent Message for Large Language Model Inference [57.109354287786154]
We introduce an innovative method for optimizing the KV cache, which considerably minimizes its memory footprint.
CORM, a KV cache eviction policy, dynamically retains essential key-value pairs for inference without the need for model fine-tuning.
Our validation shows that CORM reduces the inference memory usage of KV cache by up to 70% with negligible performance degradation across six tasks in LongBench.
arXiv Detail & Related papers (2024-04-24T16:11:54Z) - RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation [11.321659218769598]
Retrieval-Augmented Generation (RAG) has shown significant improvements in various natural language processing tasks.
RAGCache organizes the intermediate states of retrieved knowledge in a knowledge tree and caches them in the GPU and host memory hierarchy.
RAGCache reduces the time to first token (TTTF) by up to 4x and improves the throughput by up to 2.1x compared to vLLM integrated with Faiss.
arXiv Detail & Related papers (2024-04-18T18:32:30Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Cached Transformers: Improving Transformers with Differentiable Memory
Cache [71.28188777209034]
This work introduces a new Transformer model called Cached Transformer.
It uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens.
arXiv Detail & Related papers (2023-12-20T03:30:51Z) - Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton
Layouts for Arrays [0.3749861135832073]
We show how the Morton layout can be generalized to a very large family of multi-dimensional data layouts.
We propose a chromosomal representation for such layouts as well as a methodology for estimating the fitness of array layouts.
We show that our fitness function correlates to kernel running time in real hardware, and that our evolutionary strategy allows us to find candidates with favorable simulated cache properties.
arXiv Detail & Related papers (2023-09-13T14:54:54Z) - Evolutionary Design of the Memory Subsystem [2.378428291297535]
We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology.
To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools.
We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.
arXiv Detail & Related papers (2023-03-07T10:45:51Z) - Multi-objective optimization of energy consumption and execution time in
a single level cache memory for embedded systems [2.378428291297535]
Multi-objective optimization may help to minimize both conflicting metrics in an independent manner.
Our design method reaches an average improvement of 64.43% and 91.69% in execution time and energy consumption.
arXiv Detail & Related papers (2023-02-22T09:35:03Z) - Accelerating Deep Learning Classification with Error-controlled
Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching.
While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error.
We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.