An empirical study of bloated dependencies in CommonJS packages
- URL: http://arxiv.org/abs/2405.17939v1
- Date: Tue, 28 May 2024 08:04:01 GMT
- Title: An empirical study of bloated dependencies in CommonJS packages
- Authors: Yuxin Liu, Deepika Tiwari, Cristian Bogdan, Benoit Baudry,
- Abstract summary: We conduct an empirical study to investigate the bloated dependencies that are entirely unused within server-side applications.
We propose a trace-based dynamic analysis that monitors file access, to determine which dependencies are not accessed during runtime.
Our findings suggest that native support for dependency debloating in package managers could significantly alleviate the burden of maintaining dependencies.
- Score: 6.115666382910127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: JavaScript packages are notoriously prone to bloat, a factor that significantly impacts the performance and maintainability of web applications. While web bundlers and tree-shaking can mitigate this issue in client-side applications at the function level, they cannot effectively detect and remove bloat in server-side applications. In this paper, we conduct an empirical study to investigate the bloated dependencies that are entirely unused within server-side applications. Our study focuses on applications built with the widely used and highly dynamic CommonJS module system. We propose a trace-based dynamic analysis that monitors file access, to determine which dependencies are not accessed during runtime. To conduct our study, we curate an original dataset of 92 CommonJS packages with a median test coverage of 96.9% and a total of 50,661 dependencies. Our dynamic analysis identifies and successfully removes 50.7% of these dependencies while maintaining the correct build of all packages. Furthermore, we find that 14.9% of directly used dependencies and 51.3% of indirect dependencies are bloated. A key insight is that focusing on removing only the direct bloated dependencies by cleaning the package.json file, also removes a significant share of unnecessary bloated indirect dependencies. Compared to the state-of-the-art dynamic debloating technique, our analysis based on file accesses has fewer false positives, and demonstrates higher accuracy in detecting bloated dependencies. Our findings suggest that native support for dependency debloating in package managers could significantly alleviate the burden of maintaining dependencies.
Related papers
- A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies.
One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained.
This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - See to Believe: Using Visualization To Motivate Updating Third-party Dependencies [1.7914660044009358]
Security vulnerabilities introduced by applications using third-party dependencies are on the increase.
Developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision.
In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update.
arXiv Detail & Related papers (2024-05-15T03:57:27Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - Decoupled Subgraph Federated Learning [57.588938805581044]
We address the challenge of federated learning on graph-structured data distributed across multiple clients.
We present a novel framework for this scenario, named FedStruct, that harnesses deep structural dependencies.
We validate the effectiveness of FedStruct through experimental results conducted on six datasets for semi-supervised node classification.
arXiv Detail & Related papers (2024-02-29T13:47:23Z) - Dependency Practices for Vulnerability Mitigation [4.710141711181836]
We analyze more than 450 vulnerabilities in the npm ecosystem to understand why dependent packages remain vulnerable.
We identify over 200,000 npm packages that are infected through their dependencies.
We use 9 features to build a prediction model that identifies packages that quickly adopt the vulnerability fix and prevent further propagation of vulnerabilities.
arXiv Detail & Related papers (2023-10-11T19:48:46Z) - Analyzing the Evolution of Inter-package Dependencies in Operating
Systems: A Case Study of Ubuntu [7.76541950830141]
An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures.
For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files.
We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files.
arXiv Detail & Related papers (2023-07-10T10:12:21Z) - Analyzing Maintenance Activities of Software Libraries [65.268245109828]
Industrial applications heavily integrate open-source software libraries nowadays.
I want to introduce an automatic monitoring approach for industrial applications to identify open-source dependencies that show negative signs regarding their current or future maintenance activities.
arXiv Detail & Related papers (2023-06-09T16:51:25Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Multi-Target XGBoostLSS Regression [91.3755431537592]
We present an extension of XGBoostLSS that models multiple targets and their dependencies in a probabilistic regression setting.
Our approach outperforms existing GBMs with respect to runtime and compares well in terms of accuracy.
arXiv Detail & Related papers (2022-10-13T08:26:14Z) - Demystifying Dependency Bugs in Deep Learning Stack [7.488059560714949]
This paper characterizes symptoms, root causes and fix patterns of dependency bugs (DBs) across the whole Deep Learning stack.
Our findings shed light on practical implications on dependency management.
arXiv Detail & Related papers (2022-07-21T07:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.