Evolutionary Design of the Memory Subsystem
- URL: http://arxiv.org/abs/2303.16074v1
- Date: Tue, 7 Mar 2023 10:45:51 GMT
- Title: Evolutionary Design of the Memory Subsystem
- Authors: Josefa D\'iaz \'Alvarez, Jos\'e L. Risco-Mart\'in and J. Manuel
Colmenar
- Abstract summary: We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology.
To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools.
We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.
- Score: 2.378428291297535
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The memory hierarchy has a high impact on the performance and power
consumption in the system. Moreover, current embedded systems, included in
mobile devices, are specifically designed to run multimedia applications, which
are memory intensive. This increases the pressure on the memory subsystem and
affects the performance and energy consumption. In this regard, the thermal
problems, performance degradation and high energy consumption, can cause
irreversible damage to the devices. We address the optimization of the whole
memory subsystem with three approaches integrated as a single methodology.
Firstly, the thermal impact of register file is analyzed and optimized.
Secondly, the cache memory is addressed by optimizing cache configuration
according to running applications and improving both performance and power
consumption. Finally, we simplify the design and evaluation process of
general-purpose and customized dynamic memory manager, in the main memory. To
this aim, we apply different evolutionary algorithms in combination with memory
simulators and profiling tools. This way, we are able to evaluate the quality
of each candidate solution and take advantage of the exploration of solutions
given by the optimization algorithm.We also provide an experimental experience
where our proposal is assessed using well-known benchmark applications.
Related papers
- A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation.
deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency.
This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z) - APOLLO: SGD-like Memory, AdamW-level Performance [61.53444035835778]
Large language models (LLMs) are notoriously memory-intensive during training.
Various memory-efficient Scals have been proposed to reduce memory usage.
They face critical challenges: (i) costly SVD operations; (ii) significant performance trade-offs compared to AdamW; and (iii) still substantial memory overhead to maintain competitive performance.
arXiv Detail & Related papers (2024-12-06T18:55:34Z) - Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification [50.596077598766975]
We explore a memory-efficient training strategy for deep speaker embedding learning in resource-constrained scenarios.
For activations, we design two types of reversible neural networks which eliminate the need to store intermediate activations.
For states, we introduce a dynamic quantization approach that replaces the original 32-bit floating-point values with a dynamic tree-based 8-bit data type.
arXiv Detail & Related papers (2024-12-02T06:57:46Z) - CHIME: Energy-Efficient STT-RAM-based Concurrent Hierarchical In-Memory Processing [1.5566524830295307]
This paper introduces a novel PiC/PiM architecture, Concurrent Hierarchical In-Memory Processing (CHIME)
CHIME strategically incorporates heterogeneous compute units across multiple levels of the memory hierarchy.
Experiments reveal that, compared to the state-of-the-art bit-line computing approaches, CHIME achieves significant speedup and energy savings of 57.95% and 78.23%.
arXiv Detail & Related papers (2024-07-29T01:17:54Z) - A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems [4.651702738999686]
We present a novel parallel evolutionary algorithm for DMMs optimization in embedded systems.
Our framework is able to reach a speed-up of 86.40x when compared with other state-of-the-art approaches.
arXiv Detail & Related papers (2024-06-28T15:47:25Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Optimizing L1 cache for embedded systems through grammatical evolution [1.9371782627708491]
Grammatical Evolution (GE) is able to efficiently find the best cache configurations for a given set of benchmark applications.
Our proposal is able to find cache configurations that obtain an average improvement of $62%$ versus a real world baseline configuration.
arXiv Detail & Related papers (2023-03-06T18:10:00Z) - Multi-objective optimization of energy consumption and execution time in
a single level cache memory for embedded systems [2.378428291297535]
Multi-objective optimization may help to minimize both conflicting metrics in an independent manner.
Our design method reaches an average improvement of 64.43% and 91.69% in execution time and energy consumption.
arXiv Detail & Related papers (2023-02-22T09:35:03Z) - Heterogeneous Data-Centric Architectures for Modern Data-Intensive
Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
processing-in-memory (PIM) is a promising execution paradigm that alleviates the data movement bottleneck in modern applications.
In this paper, we show how to take advantage of the PIM paradigm for two modern data-intensive applications.
arXiv Detail & Related papers (2022-05-29T13:43:17Z) - Recurrent Dynamic Embedding for Video Object Segmentation [54.52527157232795]
We propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size.
We propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos.
We also design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank.
arXiv Detail & Related papers (2022-05-08T02:24:43Z) - Memformer: A Memory-Augmented Transformer for Sequence Modeling [55.780849185884996]
We present Memformer, an efficient neural network for sequence modeling.
Our model achieves linear time complexity and constant memory space complexity when processing long sequences.
arXiv Detail & Related papers (2020-10-14T09:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.