GreenMalloc: Allocator Optimisation for Industrial Workloads
- URL: http://arxiv.org/abs/2510.21405v1
- Date: Fri, 24 Oct 2025 12:49:38 GMT
- Title: GreenMalloc: Allocator Optimisation for Industrial Workloads
- Authors: Aidan Dakhama, W. B. Langdon, Hector D. Menendez, Karine Even-Mendoza,
- Abstract summary: GreenMalloc is a framework for automatically configuring memory allocators.<n>We show up to 4.1 percantage reduction in average heap usage without loss of efficiency.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present GreenMalloc, a multi objective search-based framework for automatically configuring memory allocators. Our approach uses NSGA II and rand_malloc as a lightweight proxy benchmarking tool. We efficiently explore allocator parameters from execution traces and transfer the best configurations to gem5, a large system simulator, in a case study on two allocators: the GNU C/CPP compiler's glibc malloc and Google's TCMalloc. Across diverse workloads, our empirical results show up to 4.1 percantage reduction in average heap usage without loss of runtime efficiency; indeed, we get a 0.25 percantage reduction.
Related papers
- Unlocking the Power of SAM 2 for Few-Shot Segmentation [54.562050590453225]
Few-Shot (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting.<n>Recently, SAM 2 has extended SAM by supporting video segmentation, whose class-agnostic matching ability is useful to FSS.<n>We design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way.<n>We further design Iterative Memory Refinement to fuse more query FG features into the memory, and devise a Support-Calibrated Memory Attention to suppress the unexpected query BG features in memory.
arXiv Detail & Related papers (2025-05-20T09:02:53Z) - SJMalloc: the security-conscious, fast, thread-safe and memory-efficient heap allocator [0.0]
Heap-based exploits pose a significant threat to application security.
hardened allocators have not been widely adopted in real-world applications.
SJMalloc stores its metadata out-of-band, away from the application's data on the heap.
SJMalloc demonstrates a 6% performance improvement compared to GLibcs allocator, while using only 5% more memory.
arXiv Detail & Related papers (2024-10-23T14:47:12Z) - FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization [52.57327385675752]
Direct 2D-3D matching requires significantly less memory but suffers from lower accuracy due to the larger and more ambiguous search space.<n>We address this ambiguity by fusing local and global descriptors using a weighted average operator.<n>We achieve performance close to hierarchical methods while using 43% less memory and running 1.6 times faster.
arXiv Detail & Related papers (2024-08-21T23:42:16Z) - MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence [97.93517982908007]
In cross-domain few-shot classification, NCC aims to learn representations to construct a metric space where few-shot classification can be performed.
In this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes.
We propose a bi-level optimization framework, emphmaximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data.
arXiv Detail & Related papers (2024-05-29T05:59:52Z) - Fast Inference with Kronecker-Sparse Matrices [4.387337528923525]
Existing GPU kernels for KS matrix multiplication suffer from high data movement costs.<n>We propose a fused, output-stationary GPU kernel that eliminates these overheads.<n>We demonstrate in FP32 end-to-end latency reductions of up to 22% in ViT-S/16 and 16% in GPT-2 medium.
arXiv Detail & Related papers (2024-05-23T19:36:10Z) - StarMalloc: A Formally Verified, Concurrent, Performant, and Security-Oriented Memory Allocator [3.7554217682190365]
We show how to specify and verify StarMalloc, relying on dependent types and modular abstractions to enable efficient verification.
We show that StarMalloc can be used with real-world projects, including the Firefox browser, and evaluate it against 10 state-of-the-art memory allocators.
arXiv Detail & Related papers (2024-03-14T14:29:01Z) - SeMalloc: Semantics-Informed Memory Allocator [18.04397502953383]
Use-after-free (UAF) is a critical and prevalent problem in memory unsafe languages.
We show one way to balance the trinity by passing more semantics about the heap object to the allocator.
In SeMalloc, only heap objects allocated from the same call site and via the same function call stack can possibly share a virtual memory address.
arXiv Detail & Related papers (2024-02-02T21:02:15Z) - ArchGym: An Open-Source Gymnasium for Machine Learning Assisted
Architecture Design [52.57999109204569]
ArchGym is an open-source framework that connects diverse search algorithms to architecture simulators.
We evaluate ArchGym across multiple vanilla and domain-specific search algorithms in designing custom memory controller, deep neural network accelerators, and custom SOC for AR/VR workloads.
arXiv Detail & Related papers (2023-06-15T06:41:23Z) - Memory Planning for Deep Neural Networks [0.0]
We study memory allocation patterns in DNNs during inference.
Latencies incurred due to such textttmutex contention produce undesirable bottlenecks in user-facing services.
We present an implementation of textttMemoMalloc in the PyTorch deep learning framework.
arXiv Detail & Related papers (2022-02-23T05:28:18Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - ROME: Robustifying Memory-Efficient NAS via Topology Disentanglement and
Gradient Accumulation [106.04777600352743]
Differentiable architecture search (DARTS) is largely hindered by its substantial memory cost since the entire supernet resides in the memory.
The single-path DARTS comes in, which only chooses a single-path submodel at each step.
While being memory-friendly, it also comes with low computational costs.
We propose a new algorithm called RObustifying Memory-Efficient NAS (ROME) to give a cure.
arXiv Detail & Related papers (2020-11-23T06:34:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.