Detecting and removing bloated dependencies in CommonJS packages
- URL: http://arxiv.org/abs/2405.17939v2
- Date: Sat, 18 Jan 2025 07:29:36 GMT
- Title: Detecting and removing bloated dependencies in CommonJS packages
- Authors: Yuxin Liu, Deepika Tiwari, Cristian Bogdan, Benoit Baudry,
- Abstract summary: We present the first study to investigate bloated dependencies within server-side JavaScript applications.<n>We propose a trace-based dynamic analysis that monitors the OS file system to determine which dependencies are not accessed during runtime.
- Score: 6.115666382910127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: JavaScript packages are notoriously prone to bloat, a factor that significantly impacts the performance and maintainability of web applications. While web bundlers and tree-shaking can mitigate this issue in client-side applications, state-of-the-art techniques have limitations on the detection and removal of bloat in server-side applications. In this paper, we present the first study to investigate bloated dependencies within server-side JavaScript applications, focusing on those built with the widely used and highly dynamic CommonJS module system. We propose a trace-based dynamic analysis that monitors the OS file system to determine which dependencies are not accessed during runtime. To evaluate our approach, we curate an original dataset of 91 CommonJS packages with a total of 50,488 dependencies. Compared to the state-of-the-art dynamic and static approaches, our trace-based analysis demonstrates higher accuracy in detecting bloated dependencies. Our analysis identifies 50.6% of the 50,488 dependencies as bloated: 13.8% of direct dependencies and 51.3% of indirect dependencies. Furthermore, removing only the direct bloated dependencies by cleaning the dependency configuration file can remove a significant share of unnecessary bloated indirect dependencies while preserving functional correctness.
Related papers
- DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale [39.92722886613929]
DI-BENCH is a large-scale benchmark and evaluation framework designed to assess Large Language Models' capability on dependency inference.
The benchmark features 581 repositories with testing environments across Python, C#, Rust, and JavaScript.
Extensive experiments with textual and execution-based metrics reveal that the current best-performing model achieves only a 42.9% execution pass rate.
arXiv Detail & Related papers (2025-01-23T14:27:11Z) - Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.
Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.
Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z) - Learning Spatial-Semantic Features for Robust Video Object Segmentation [108.045326229865]
We propose a robust video object segmentation framework that learns spatial-semantic features and discriminative object queries.
The proposed method achieves state-of-the-art performance on benchmark data sets, including the DAVIS 2017 test (textbf87.8%), YoutubeVOS 2019 (textbf88.1%), MOSE val (textbf74.0%), and LVOS test (textbf73.0%)
arXiv Detail & Related papers (2024-07-10T15:36:00Z) - A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies.
One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained.
This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z) - How to Understand Whole Software Repository? [64.19431011897515]
An excellent understanding of the whole repository will be the critical path to Automatic Software Engineering (ASE)
We develop a novel method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories.
To better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - See to Believe: Using Visualization To Motivate Updating Third-party Dependencies [1.7914660044009358]
Security vulnerabilities introduced by applications using third-party dependencies are on the increase.
Developers are wary of library updates, even to fix vulnerabilities, citing that being unaware, or that the migration effort to update outweighs the decision.
In this paper, we hypothesize that the dependency graph visualization (DGV) approach will motivate developers to update.
arXiv Detail & Related papers (2024-05-15T03:57:27Z) - Analyzing the Accessibility of GitHub Repositories for PyPI and NPM Libraries [91.97201077607862]
Industrial applications heavily rely on open-source software (OSS) libraries, which provide various benefits.
To monitor the activities of such communities, a comprehensive list of repositories for the libraries of an ecosystem must be accessible.
In this study, we analyze the accessibility of GitHub repositories for PyPI and NPM libraries.
arXiv Detail & Related papers (2024-04-26T13:27:04Z) - Decoupled Subgraph Federated Learning [57.588938805581044]
We address the challenge of federated learning on graph-structured data distributed across multiple clients.
We present a novel framework for this scenario, named FedStruct, that harnesses deep structural dependencies.
We validate the effectiveness of FedStruct through experimental results conducted on six datasets for semi-supervised node classification.
arXiv Detail & Related papers (2024-02-29T13:47:23Z) - Dependency Practices for Vulnerability Mitigation [4.710141711181836]
We analyze more than 450 vulnerabilities in the npm ecosystem to understand why dependent packages remain vulnerable.
We identify over 200,000 npm packages that are infected through their dependencies.
We use 9 features to build a prediction model that identifies packages that quickly adopt the vulnerability fix and prevent further propagation of vulnerabilities.
arXiv Detail & Related papers (2023-10-11T19:48:46Z) - Analyzing the Evolution of Inter-package Dependencies in Operating
Systems: A Case Study of Ubuntu [7.76541950830141]
An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures.
For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files.
We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files.
arXiv Detail & Related papers (2023-07-10T10:12:21Z) - Analyzing Maintenance Activities of Software Libraries [65.268245109828]
Industrial applications heavily integrate open-source software libraries nowadays.
I want to introduce an automatic monitoring approach for industrial applications to identify open-source dependencies that show negative signs regarding their current or future maintenance activities.
arXiv Detail & Related papers (2023-06-09T16:51:25Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation [103.90033029330527]
Few-Shot Instance (FSIS) requires detecting and segmenting novel classes with limited support examples.
We introduce a unified framework, Reference Twice (RefT), to exploit the relationship between support and query features for FSIS.
arXiv Detail & Related papers (2023-01-03T15:33:48Z) - Multi-Target XGBoostLSS Regression [91.3755431537592]
We present an extension of XGBoostLSS that models multiple targets and their dependencies in a probabilistic regression setting.
Our approach outperforms existing GBMs with respect to runtime and compares well in terms of accuracy.
arXiv Detail & Related papers (2022-10-13T08:26:14Z) - Demystifying Dependency Bugs in Deep Learning Stack [7.488059560714949]
This paper characterizes symptoms, root causes and fix patterns of dependency bugs (DBs) across the whole Deep Learning stack.
Our findings shed light on practical implications on dependency management.
arXiv Detail & Related papers (2022-07-21T07:56:03Z) - Modeling Multi-Label Action Dependencies for Temporal Action
Localization [53.53490517832068]
Real-world videos contain many complex actions with inherent relationships between action classes.
We propose an attention-based architecture that models these action relationships for the task of temporal action localization in unoccurrence videos.
We show improved performance over state-of-the-art methods on multi-label action localization benchmarks.
arXiv Detail & Related papers (2021-03-04T13:37:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.