The Hidden Bloat in Machine Learning Systems
- URL: http://arxiv.org/abs/2503.14226v1
- Date: Tue, 18 Mar 2025 13:04:25 GMT
- Title: The Hidden Bloat in Machine Learning Systems
- Authors: Huaifeng Zhang, Ahmed Ali-Eldin,
- Abstract summary: Software bloat refers to code and features that is not used by a software during runtime.<n>For Machine Learning (ML) systems, bloat is a major contributor to their technical debt leading to decreased performance and resource wastage.<n>We present Negativa-ML, a novel tool to identify and remove bloat in ML frameworks by analyzing their shared libraries.
- Score: 0.22099217573031676
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Software bloat refers to code and features that is not used by a software during runtime. For Machine Learning (ML) systems, bloat is a major contributor to their technical debt leading to decreased performance and resource wastage. In this work, we present, Negativa-ML, a novel tool to identify and remove bloat in ML frameworks by analyzing their shared libraries. Our approach includes novel techniques to detect and locate unnecessary code within device code - a key area overlooked by existing research, which focuses primarily on host code. We evaluate Negativa-ML using four popular ML frameworks across ten workloads over 300 shared libraries. The results demonstrate that the ML frameworks are highly bloated on both the device and host code side. On average, Negativa-ML reduces the device code size in these frameworks by up to 75% and the host code by up to 72%, resulting in total file size reductions of up to 55%. The device code is a primary source of bloat within ML frameworks. Through debloating, we achieve reductions in peak host memory usage, peak GPU memory usage, and execution time by up to 74.6%, 69.6%, and 44.6%, respectively.
Related papers
- KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution [59.20933707301566]
Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks.
In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel.
To evaluate if ML models are useful while developing such large-scale systems-level software, we introduce kGym and kBench.
arXiv Detail & Related papers (2024-07-02T21:44:22Z) - Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models [107.88745040504887]
We study the harmlessness alignment problem of multimodal large language models (MLLMs)
Inspired by this, we propose a novel jailbreak method named HADES, which hides and amplifies the harmfulness of the malicious intent within the text input.
Experimental results show that HADES can effectively jailbreak existing MLLMs, which achieves an average Attack Success Rate (ASR) of 90.26% for LLaVA-1.5 and 71.60% for Gemini Pro Vision.
arXiv Detail & Related papers (2024-03-14T18:24:55Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Compressing LLMs: The Truth is Rarely Pure and Never Simple [90.05366363633568]
Knowledge-Intensive Compressed LLM BenchmarK aims to redefine the evaluation protocol for compressed Large Language Models.
LLM-KICK unveils many favorable merits and unfortunate plights of current SoTA compression methods.
LLM-KICK is designed to holistically access compressed LLMs' ability for language understanding, reasoning, generation, in-context retrieval, in-context summarization, etc.
arXiv Detail & Related papers (2023-10-02T17:42:37Z) - Condensing Multilingual Knowledge with Lightweight Language-Specific
Modules [52.973832863842546]
We introduce the Language-Specific Matrix Synthesis (LMS) method.
This approach constructs LS modules by generating low-rank matrices from two significantly smaller matrices.
We condense multilingual knowledge from multiple LS modules into a single shared module with the Fuse Distillation (FD) technique.
arXiv Detail & Related papers (2023-05-23T12:21:38Z) - The Cure is in the Cause: A Filesystem for Container Debloating [3.072029094326428]
Over 50% of the top-downloaded containers have more than 60% bloat, and BAFFS reduces container sizes significantly.<n>For serverless functions, BAFFS reduces cold start latency by up to 68%.
arXiv Detail & Related papers (2023-05-08T11:41:30Z) - MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML
on Microcontrollers [3.1823074562424756]
We present the MEMA framework for efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems.
We compare the performance of runtimes derived from MEMA to existing state-of-the-art libraries on ARM-based TinyML systems.
arXiv Detail & Related papers (2023-04-12T00:27:11Z) - Machine Learning Systems are Bloated and Vulnerable [2.7023370929727277]
We develop MMLB, a framework for analyzing bloat in software systems.
MMLB measures the amount of bloat at both the container and package levels.
We show that bloat accounts for up to 80% of machine learning container sizes.
arXiv Detail & Related papers (2022-12-16T10:34:27Z) - MinUn: Accurate ML Inference on Microcontrollers [2.2638536653874195]
Running machine learning inference on tiny devices, known as TinyML, is an emerging research area.
We describe MinUn, the first TinyML framework that holistically addresses these issues to generate efficient code for ARM microcontrollers.
arXiv Detail & Related papers (2022-10-29T10:16:12Z) - A TinyML Platform for On-Device Continual Learning with Quantized Latent
Replays [66.62377866022221]
Latent Replay-based Continual Learning (CL) techniques enable online, serverless adaptation in principle.
We introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power processor.
Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory.
arXiv Detail & Related papers (2021-10-20T11:01:23Z) - MLPerf Tiny Benchmark [1.1178096184080788]
We present Tinyerf Tiny, the first industry-standard benchmark suite for ultra-low-power tiny machine learning systems.
Tinyerf Tiny measures the accuracy, latency, and energy of machine learning inference to properly evaluate the tradeoffs between systems.
arXiv Detail & Related papers (2021-06-14T17:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.