Related papers: A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems

A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems

URL: http://arxiv.org/abs/2407.09555v1
Date: Fri, 28 Jun 2024 15:47:25 GMT
Title: A parallel evolutionary algorithm to optimize dynamic memory managers in embedded systems
Authors: José L. Risco-Martín, David Atienza, J. Manuel Colmenar, Oscar Garnica,
Abstract summary: We present a novel parallel evolutionary algorithm for DMMs optimization in embedded systems. Our framework is able to reach a speed-up of 86.40x when compared with other state-of-the-art approaches.
Score: 4.651702738999686
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: For the last thirty years, several Dynamic Memory Managers (DMMs) have been proposed. Such DMMs include first fit, best fit, segregated fit and buddy systems. Since the performance, memory usage and energy consumption of each DMM differs, software engineers often face difficult choices in selecting the most suitable approach for their applications. This issue has special impact in the field of portable consumer embedded systems, that must execute a limited amount of multimedia applications (e.g., 3D games, video players and signal processing software, etc.), demanding high performance and extensive memory usage at a low energy consumption. Recently, we have developed a novel methodology based on genetic programming to automatically design custom DMMs, optimizing performance, memory usage and energy consumption. However, although this process is automatic and faster than state-of-the-art optimizations, it demands intensive computation, resulting in a time consuming process. Thus, parallel processing can be very useful to enable to explore more solutions spending the same time, as well as to implement new algorithms. In this paper we present a novel parallel evolutionary algorithm for DMMs optimization in embedded systems, based on the Discrete Event Specification (DEVS) formalism over a Service Oriented Architecture (SOA) framework. Parallelism significantly improves the performance of the sequential exploration algorithm. On the one hand, when the number of generations are the same in both approaches, our parallel optimization framework is able to reach a speed-up of 86.40x when compared with other state-of-the-art approaches. On the other, it improves the global quality (i.e., level of performance, low memory usage and low energy consumption) of the final DMM obtained in a 36.36% with respect to two well-known general-purpose DMMs and two state-of-the-art optimization methodologies.

Related papers

Reinforced Model Merging [53.84354455400038]
We present an innovative framework termed Reinforced Model Merging (RMM), which encompasses an environment and agent tailored for merging tasks. By utilizing data subsets during the evaluation process, we addressed the bottleneck in the reward feedback phase, thereby accelerating RMM by up to 100 times.
arXiv Detail & Related papers (2025-03-27T08:52:41Z)
Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels [12.77187564450236]
We introduce XY-Serve, a versatile, Ascend native, end-to-end production large language model (LLM) serving system. The core idea is an abstraction mechanism that smooths out the workload variability by decomposing computations into fine-grained meta primitives. For GEMM, we introduce a virtual padding scheme that adapts to dynamic shape changes while using highly efficient GEMM primitives with assorted fixed tile sizes.
arXiv Detail & Related papers (2024-12-24T02:27:44Z)
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation [92.73405185996315]
Large Multimodal Models (LMMs) have demonstrated impressive capabilities in multimodal understanding and generation. Existing approaches, such as layout planning for multi-step generation and learning from human feedback or AI feedback, depend heavily on prompt engineering. We introduce a model-agnostic iterative self-feedback framework (SILMM) that can enable LMMs to provide helpful and scalable self-improvement and optimize text-image alignment.
arXiv Detail & Related papers (2024-12-08T05:28:08Z)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference [49.94169109038806]
This paper introduces EPS-MoE, a novel expert pipeline scheduler for MoE. Our results demonstrate an average 21% improvement in prefill throughput over existing parallel inference methods.
arXiv Detail & Related papers (2024-10-16T05:17:49Z)
PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System [21.09681871279162]
Modern Machine Learning (ML) training on large-scale datasets is a time-consuming workload. It relies on the optimization algorithm Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance. processor-centric architectures suffer from low performance and high energy consumption while executing ML training workloads. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck.
arXiv Detail & Related papers (2024-04-10T17:00:04Z)
Extreme Compression of Large Language Models via Additive Quantization [59.3122859349777]
Our algorithm, called AQLM, generalizes the classic Additive Quantization (AQ) approach for information retrieval. We provide fast GPU and CPU implementations of AQLM for token generation, which enable us to match or outperform optimized FP16 implementations for speed.
arXiv Detail & Related papers (2024-01-11T18:54:44Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Evolutionary Design of the Memory Subsystem [2.378428291297535]
We address the optimization of the whole memory subsystem with three approaches integrated as a single methodology. To this aim, we apply different evolutionary algorithms in combination with memory simulators and profiling tools. We also provide an experimental experience where our proposal is assessed using well-known benchmark applications.
arXiv Detail & Related papers (2023-03-07T10:45:51Z)
RMM: Reinforced Memory Management for Class-Incremental Learning [102.20140790771265]
Class-Incremental Learning (CIL) trains classifiers under a strict memory budget. Existing methods use a static and ad hoc strategy for memory allocation, which is often sub-optimal. We propose a dynamic memory management strategy that is optimized for the incremental phases and different object classes.
arXiv Detail & Related papers (2023-01-14T00:07:47Z)
Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration [71.95914457415624]
Microprocessor architects are increasingly resorting to domain-specific customization in the quest for high-performance and energy-efficiency. We propose an alternative formulation that leverages Multi-Agent RL (MARL) to tackle this problem. Our evaluation shows that the MARL formulation consistently outperforms single-agent RL baselines.
arXiv Detail & Related papers (2022-11-29T17:10:24Z)
Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases [9.927754948343326]
processing-in-memory (PIM) is a promising execution paradigm that alleviates the data movement bottleneck in modern applications. In this paper, we show how to take advantage of the PIM paradigm for two modern data-intensive applications.
arXiv Detail & Related papers (2022-05-29T13:43:17Z)
Integrate Lattice-Free MMI into End-to-End Speech Recognition [87.01137882072322]
In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems. With this motivation, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems. Previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems. In this work, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI) into E2E
arXiv Detail & Related papers (2022-03-29T14:32:46Z)
Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System [3.202860612193139]
We propose an artificially intelligent memory mapping scheme, AIMM, that optimize data placement and resource utilization through page and computation remapping. AIMM uses a neural network to achieve a near-optimal mapping during execution, trained using a reinforcement learning algorithm. Our experimental evaluation shows that AIMM improves the baseline NMP performance in single and multiple program scenario by up to 70% and 50%, respectively.
arXiv Detail & Related papers (2021-04-28T09:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.