A Broad Comparative Evaluation of Software Debloating Tools
- URL: http://arxiv.org/abs/2312.13274v3
- Date: Wed, 12 Jun 2024 20:23:22 GMT
- Title: A Broad Comparative Evaluation of Software Debloating Tools
- Authors: Michael D. Brown, Adam Meily, Brian Fairservice, Akshay Sood, Jonathan Dorn, Eric Kilmer, Ronald Eytchison,
- Abstract summary: Software debloating tools seek to improve program security and performance by removing unnecessary code, called bloat.
We surveyed 10 years of debloating literature and several tools currently under commercial development to taxonomize knowledge about the debloating ecosystem.
Our evaluation, conducted on a diverse set of 20 benchmark programs, measures tools across 12 performance, security, and correctness metrics.
- Score: 3.0913520619484287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software debloating tools seek to improve program security and performance by removing unnecessary code, called bloat. While many techniques have been proposed, several barriers to their adoption have emerged. Namely, debloating tools are highly specialized, making it difficult for adopters to find the right type of tool for their needs. This is further hindered by a lack of established metrics and comparative evaluations between tools. To close this information gap, we surveyed 10 years of debloating literature and several tools currently under commercial development to taxonomize knowledge about the debloating ecosystem. We then conducted a broad comparative evaluation of 10 debloating tools to determine their relative strengths and weaknesses. Our evaluation, conducted on a diverse set of 20 benchmark programs, measures tools across 12 performance, security, and correctness metrics. Our evaluation surfaces several concerning findings that contradict the prevailing narrative in the debloating literature. First, debloating tools lack the maturity required to be used on real-world software, evidenced by a slim 22% overall success rate for creating passable debloated versions of medium- and high-complexity benchmarks. Second, debloating tools struggle to produce sound and robust programs. Using our novel differential fuzzing tool, DIFFER, we discovered that only 13% of our debloating attempts produced a sound and robust debloated program. Finally, our results indicate that debloating tools typically do not improve the performance or security posture of debloated programs by a significant degree according to our evaluation metrics. We believe that our contributions in this paper will help potential adopters better understand the landscape of tools and will motivate future research and development of more capable debloating tools.
Related papers
- Use as Directed? A Comparison of Software Tools Intended to Check Rigor and Transparency of Published Work [28.252424517077557]
Lack of standardization and transparency in scientific reporting is a major problem.<n>There are several automated tools that have been designed to check different rigor criteria.<n>We have conducted a broad comparison of 11 automated tools across 9 different rigor criteria from the ScreenIT group.
arXiv Detail & Related papers (2025-07-23T23:49:28Z) - Advancing Tool-Augmented Large Language Models via Meta-Verification and Reflection Learning [63.2198957755528]
We propose Tool-MVR, a novel Tool-Augmented LLM that achieves comprehensive System 2 reasoning through two key innovations.<n>Specifically, we first introduce Multi-Agent Meta-Verification (MAMV), a systematic pipeline that rigorously validates APIs, queries, and reasoning trajectories.<n>Second, we propose Exploration-based Reflection Learning (EXPLORE), which enhances tool reflection capabilities by leveraging tool feedback.
arXiv Detail & Related papers (2025-06-05T04:35:49Z) - Acting Less is Reasoning More! Teaching Model to Act Efficiently [87.28134636548705]
Tool-integrated reasoning augments large language models with the ability to invoke external tools to solve tasks.<n>Current approaches typically optimize only for final correctness without considering the efficiency or necessity of external tool use.<n>We propose a framework that encourages models to produce accurate answers with minimal tool calls.<n>Our approach reduces tool calls by up to 68.3% and improves tool productivity by up to 215.4%, while maintaining comparable answer accuracy.
arXiv Detail & Related papers (2025-04-21T05:40:05Z) - Vexed by VEX tools: Consistency evaluation of container vulnerability scanners [0.0]
This paper presents a study that analyzed state-of-the-art vulnerability scanning tools applied to containers.
We have focused the work on tools following the Vulnerability Exploitability eXchange (VEX) format.
arXiv Detail & Related papers (2025-03-18T16:22:43Z) - Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories [9.539825294372786]
We use two tools to extract and analyse ten large software projects.
Despite similar trends, even simple metrics such as the numbers of commits and developers may differ by up to 500%.
We find that such substantial differences are often caused by minor technical details.
arXiv Detail & Related papers (2025-01-25T07:42:56Z) - The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach [56.4040698609393]
Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition.
Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies.
We propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings.
arXiv Detail & Related papers (2024-09-10T10:12:37Z) - SoK: Software Debloating Landscape and Future Directions [3.5609179225884353]
We conceptualize the software debloating workflow, which serves as the basis for developing a multilevel taxonomy.
This framework classifies debloating tools according to their input/output artifacts, debloating strategies, and evaluation criteria.
arXiv Detail & Related papers (2024-07-15T21:52:21Z) - What Are Tools Anyway? A Survey from the Language Model Perspective [67.18843218893416]
Language models (LMs) are powerful yet mostly for text generation tasks.
We provide a unified definition of tools as external programs used by LMs.
We empirically study the efficiency of various tooling methods.
arXiv Detail & Related papers (2024-03-18T17:20:07Z) - StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models [74.88844320554284]
We introduce StableToolBench, a benchmark evolving from ToolBench.
The virtual API server contains a caching system and API simulators which are complementary to alleviate the change in API status.
The stable evaluation system designs solvable pass and win rates using GPT-4 as the automatic evaluator to eliminate the randomness during evaluation.
arXiv Detail & Related papers (2024-03-12T14:57:40Z) - TOOLVERIFIER: Generalization to New Tools via Self-Verification [69.85190990517184]
We introduce a self-verification method which distinguishes between close candidates by self-asking contrastive questions during tool selection.
Experiments on 4 tasks from the ToolBench benchmark, consisting of 17 unseen tools, demonstrate an average improvement of 22% over few-shot baselines.
arXiv Detail & Related papers (2024-02-21T22:41:38Z) - HunFlair2 in a cross-corpus evaluation of biomedical named entity
recognition and normalization tools [4.882266258243112]
We report on the results of a cross-corpus benchmark for named entity extraction using biomedical text mining tools.
Our results indicate that users of BTM tools should expect diminishing performances when applying them in the wild compared to original publications.
arXiv Detail & Related papers (2024-02-19T18:58:18Z) - AIBugHunter: A Practical Tool for Predicting, Classifying and Repairing
Software Vulnerabilities [27.891905729536372]
AIBugHunter is a novel ML-based software vulnerability analysis tool for C/C++ languages that is integrated into Visual Studio Code.
We propose a novel multi-objective optimization (MOO)-based vulnerability classification approach and a transformer-based estimation approach to help AIBugHunter accurately identify vulnerability types and estimate severity.
arXiv Detail & Related papers (2023-05-26T04:21:53Z) - AI Explainability 360: Impact and Design [120.95633114160688]
In 2019, we created AI Explainability 360 (Arya et al. 2020), an open source software toolkit featuring ten diverse and state-of-the-art explainability methods.
This paper examines the impact of the toolkit with several case studies, statistics, and community feedback.
The paper also describes the flexible design of the toolkit, examples of its use, and the significant educational material and documentation available to its users.
arXiv Detail & Related papers (2021-09-24T19:17:09Z) - Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep
Learning [66.59455427102152]
We introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks.
Each baseline is a self-contained experiment pipeline with easily reusable and extendable components.
We provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results.
arXiv Detail & Related papers (2021-06-07T23:57:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.