Related papers: An empirical study of bloated dependencies in CommonJS packages

An empirical study of bloated dependencies in CommonJS packages

URL: http://arxiv.org/abs/2405.17939v1
Date: Tue, 28 May 2024 08:04:01 GMT
Title: An empirical study of bloated dependencies in CommonJS packages
Authors: Yuxin Liu, Deepika Tiwari, Cristian Bogdan, Benoit Baudry,
Abstract summary: We conduct an empirical study to investigate the bloated dependencies that are entirely unused within server-side applications. We propose a trace-based dynamic analysis that monitors file access, to determine which dependencies are not accessed during runtime. Our findings suggest that native support for dependency debloating in package managers could significantly alleviate the burden of maintaining dependencies.
Score: 6.115666382910127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: JavaScript packages are notoriously prone to bloat, a factor that significantly impacts the performance and maintainability of web applications. While web bundlers and tree-shaking can mitigate this issue in client-side applications at the function level, they cannot effectively detect and remove bloat in server-side applications. In this paper, we conduct an empirical study to investigate the bloated dependencies that are entirely unused within server-side applications. Our study focuses on applications built with the widely used and highly dynamic CommonJS module system. We propose a trace-based dynamic analysis that monitors file access, to determine which dependencies are not accessed during runtime. To conduct our study, we curate an original dataset of 92 CommonJS packages with a median test coverage of 96.9% and a total of 50,661 dependencies. Our dynamic analysis identifies and successfully removes 50.7% of these dependencies while maintaining the correct build of all packages. Furthermore, we find that 14.9% of directly used dependencies and 51.3% of indirect dependencies are bloated. A key insight is that focusing on removing only the direct bloated dependencies by cleaning the package.json file, also removes a significant share of unnecessary bloated indirect dependencies. Compared to the state-of-the-art dynamic debloating technique, our analysis based on file accesses has fewer false positives, and demonstrates higher accuracy in detecting bloated dependencies. Our findings suggest that native support for dependency debloating in package managers could significantly alleviate the burden of maintaining dependencies.

Related papers

DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale [39.92722886613929]
DI-BENCH is a large-scale benchmark and evaluation framework designed to assess Large Language Models' capability on dependency inference. The benchmark features 581 repositories with testing environments across Python, C#, Rust, and JavaScript. Extensive experiments with textual and execution-based metrics reveal that the current best-performing model achieves only a 42.9% execution pass rate.
arXiv Detail & Related papers (2025-01-23T14:27:11Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework that learns spatial-semantic features and discriminative object queries. The proposed method achieves state-of-the-art performance on benchmark data sets, including the DAVIS 2017 test (textbf87.8%), YoutubeVOS 2019 (textbf88.1%), MOSE val (textbf74.0%), and LVOS test (textbf73.0%)
arXiv Detail & Related papers (2024-07-10T15:36:00Z)
A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies. One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained. This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z)
How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE) We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
See to Believe: Using Visualization To Motivate Updating Third-party Dependencies [1.7914660044009358]
Security vulnerabilities introduced by applications using third-party dependencies are on the increase. Developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision. In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update.
arXiv Detail & Related papers (2024-05-15T03:57:27Z)
Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits. To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible. In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z)
Decoupled Subgraph Federated Learning [57.588938805581044]
We address the challenge of federated learning on graph-structured data distributed across multiple clients. We present a novel framework for this scenario, named FedStruct, that harnesses deep structural dependencies. We validate the effectiveness of FedStruct through experimental results conducted on six datasets for semi-supervised node classification.
arXiv Detail & Related papers (2024-02-29T13:47:23Z)
Dependency Practices for Vulnerability Mitigation [4.710141711181836]
We analyze more than 450 vulnerabilities in the npm ecosystem to understand why dependent packages remain vulnerable. We identify over 200,000 npm packages that are infected through their dependencies. We use 9 features to build a prediction model that identifies packages that quickly adopt the vulnerability fix and prevent further propagation of vulnerabilities.
arXiv Detail & Related papers (2023-10-11T19:48:46Z)
Analyzing the Evolution of Inter-package Dependencies in Operating Systems: A Case Study of Ubuntu [7.76541950830141]
An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files. We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files.
arXiv Detail & Related papers (2023-07-10T10:12:21Z)
Analyzing Maintenance Activities of Software Libraries [65.268245109828]
Industrial applications heavily integrate open-source software libraries nowadays. I want to introduce an automatic monitoring approach for industrial applications to identify open-source dependencies that show negative signs regarding their current or future maintenance activities.
arXiv Detail & Related papers (2023-06-09T16:51:25Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation [103.90033029330527]
Few-Shot Instance (FSIS) requires detecting and segmenting novel classes with limited support examples. We introduce a unified framework, Reference Twice (RefT), to exploit the relationship between support and query features for FSIS.
arXiv Detail & Related papers (2023-01-03T15:33:48Z)
Multi-Target XGBoostLSS Regression [91.3755431537592]
We present an extension of XGBoostLSS that models multiple targets and their dependencies in a probabilistic regression setting. Our approach outperforms existing GBMs with respect to runtime and compares well in terms of accuracy.
arXiv Detail & Related papers (2022-10-13T08:26:14Z)
Demystifying Dependency Bugs in Deep Learning Stack [7.488059560714949]
This paper characterizes symptoms, root causes and fix patterns of dependency bugs (DBs) across the whole Deep Learning stack. Our findings shed light on practical implications on dependency management.
arXiv Detail & Related papers (2022-07-21T07:56:03Z)
Modeling Multi-Label Action Dependencies for Temporal Action Localization [53.53490517832068]
Real-world videos contain many complex actions with inherent relationships between action classes. We propose an attention-based architecture that models these action relationships for the task of temporal action localization in unoccurrence videos. We show improved performance over state-of-the-art methods on multi-label action localization benchmarks.
arXiv Detail & Related papers (2021-03-04T13:37:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.