A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems
- URL: http://arxiv.org/abs/2407.09555v1
- Date: Fri, 28 Jun 2024 15:47:25 GMT
- Title: A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems
- Authors: José L. Risco-Martín, David Atienza, J. Manuel Colmenar, Oscar Garnica,
- Abstract summary: We present a novel parallel evolutionary algorithm for DMMs optimization in embedded systems.
Our framework is able to reach a speed-up of 86.40x when compared with other state-of-the-art approaches.
- Score: 4.651702738999686
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: For the last thirty years, several Dynamic Memory Managers (DMMs) have been proposed. Such DMMs include first fit, best fit, segregated fit and buddy systems. Since the performance, memory usage and energy consumption of each DMM differs, software engineers often face difficult choices in selecting the most suitable approach for their applications. This issue has special impact in the field of portable consumer embedded systems, that must execute a limited amount of multimedia applications (e.g., 3D games, video players and signal processing software, etc.), demanding high performance and extensive memory usage at a low energy consumption. Recently, we have developed a novel methodology based on genetic programming to automatically design custom DMMs, optimizing performance, memory usage and energy consumption. However, although this process is automatic and faster than state-of-the-art optimizations, it demands intensive computation, resulting in a time consuming process. Thus, parallel processing can be very useful to enable to explore more solutions spending the same time, as well as to implement new algorithms. In this paper we present a novel parallel evolutionary algorithm for DMMs optimization in embedded systems, based on the Discrete Event Specification (DEVS) formalism over a Service Oriented Architecture (SOA) framework. Parallelism significantly improves the performance of the sequential exploration algorithm. On the one hand, when the number of generations are the same in both approaches, our parallel optimization framework is able to reach a speed-up of 86.40x when compared with other state-of-the-art approaches. On the other, it improves the global quality (i.e., level of performance, low memory usage and low energy consumption) of the final DMM obtained in a 36.36% with respect to two well-known general-purpose DMMs and two state-of-the-art optimization methodologies.
Related papers
- EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE.
Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z) - PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System [21.09681871279162]
Modern Machine Learning (ML) training on large-scale datasets is a time-consuming workload.
It relies on the optimization algorithm Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance.
processor-centric architectures suffer from low performance and high energy consumption while executing ML training workloads.
Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck.
arXiv Detail & Related papers (2024-04-10T17:00:04Z) - HEAM : Hashed Embedding Acceleration using Processing-In-Memory [17.66751227197112]
In today's data centers, personalized recommendation systems face challenges such as the need for large memory capacity and high bandwidth.
Previous approaches have relied on DIMM-based near-memory processing techniques or introduced 3D-stacked DRAM to address memory-bound issues.
This paper introduces HEAM, a heterogeneous memory architecture that integrates 3D-stacked DRAM with DIMM to accelerate recommendation systems.
arXiv Detail & Related papers (2024-02-06T14:26:22Z) - Extreme Compression of Large Language Models via Additive Quantization [59.3122859349777]
Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval.
We provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed.
arXiv Detail & Related papers (2024-01-11T18:54:44Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Evolutionary Design of the Memory Subsystem [2.378428291297535]
We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology.
To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools.
We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.
arXiv Detail & Related papers (2023-03-07T10:45:51Z) - RMM: Reinforced Memory Management for Class-Incremental Learning [102.20140790771265]
Class-Incremental Learning (CIL) trains classifiers under a strict memory budget.
Existing methods use a static and ad hoc strategy for memory allocation, which is often sub-optimal.
We propose a dynamic memory management strategy that is optimized for the incremental phases and different object classes.
arXiv Detail & Related papers (2023-01-14T00:07:47Z) - Multi-Agent Reinforcement Learning for Microprocessor Design Space
Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency.
We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem.
Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z) - Integrate Lattice-Free MMI into End-to-End Speech Recognition [87.01137882072322]
In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems.
With this motivation, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems.
Previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems.
In this work, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI) into E2E
arXiv Detail & Related papers (2022-03-29T14:32:46Z) - Continual Learning Approach for Improving the Data and Computation
Mapping in Near-Memory Processing System [3.202860612193139]
We propose an artificially intelligent memory mapping scheme, AIMM, that optimize data placement and resource utilization through page and computation remapping.
AIMM uses a neural network to achieve a near-optimal mapping during execution, trained using a reinforcement learning algorithm.
Our experimental evaluation shows that AIMM improves the baseline NMP performance in single and multiple program scenario by up to 70% and 50%, respectively.
arXiv Detail & Related papers (2021-04-28T09:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.