Automatic Specialization of Third-Party Java Dependencies
- URL: http://arxiv.org/abs/2302.08370v2
- Date: Fri, 13 Oct 2023 08:03:47 GMT
- Title: Automatic Specialization of Third-Party Java Dependencies
- Authors: C\'esar Soto-Valero and Deepika Tiwari and Tim Toady and Benoit Baudry
- Abstract summary: Large-scale code reuse significantly reduces both development costs and time.
Massive share of third-party code in software projects poses new challenges, especially in terms of maintenance and security.
We propose a novel technique to specialize dependencies of Java projects, based on their actual usage.
- Score: 3.7973152331947815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale code reuse significantly reduces both development costs and time.
However, the massive share of third-party code in software projects poses new
challenges, especially in terms of maintenance and security. In this paper, we
propose a novel technique to specialize dependencies of Java projects, based on
their actual usage. Given a project and its dependencies, we systematically
identify the subset of each dependency that is necessary to build the project,
and we remove the rest. As a result of this process, we package each
specialized dependency in a JAR file. Then, we generate specialized dependency
trees where the original dependencies are replaced by the specialized versions.
This allows building the project with significantly less third-party code than
the original. As a result, the specialized dependencies become a first-class
concept in the software supply chain, rather than a transient artifact in an
optimizing compiler toolchain. We implement our technique in a tool called
DepTrim, which we evaluate with 30 notable open-source Java projects. DepTrim
specializes a total of 343 (86.6%) dependencies across these projects, and
successfully rebuilds each project with a specialized dependency tree.
Moreover, through this specialization, DepTrim removes a total of 57,444
(42.2%) classes from the dependencies, reducing the ratio of dependency classes
to project classes from 8.7x in the original projects to 5.0x after
specialization. These novel results indicate that dependency specialization
significantly reduces the share of third-party code in Java projects.
Related papers
- Canonicalization for Unreproducible Builds in Java [11.367562045401554]
We introduce a conceptual framework for reproducible builds, analyze a large dataset from Reproducible Central, and develop a novel taxonomy of six root causes of unreproducibility.
We present Chains-Rebuild, a tool that raises success from 9.48% to 26.89% on 12,283 unreproducible artifacts.
arXiv Detail & Related papers (2025-04-30T14:17:54Z) - Beyond Dependencies: The Role of Copy-Based Reuse in Open Source Software Development [5.412781090113212]
In Open Source Software, resources of any project are open for reuse by introducing dependencies or copying the resource itself.
Our aim is to enable future research and tool development to increase efficiency and reduce the risks of copy-based reuse.
arXiv Detail & Related papers (2024-09-07T13:50:40Z) - Local Software Buildability across Java Versions (Registered Report) [0.0]
We will try to automatically build every project in containers with Java versions 6 to 23 installed.
Success or failure will be determined by exit codes, and standard output and error streams will be saved.
arXiv Detail & Related papers (2024-08-21T11:51:00Z) - Compilation of Commit Changes within Java Source Code Repositories [2.556190321164248]
JESS reduces the code, retaining only those parts that the committed change references.
JESS is able to compile, in isolation, 72% of methods and constructors, of which 89% have bytecode equal to the original one.
On the Project KB database of fix-commits, in which only 8% of files modified within the commits can be compiled with the provided build scripts, JESS is able to compile 73% of all files that these commits modify.
arXiv Detail & Related papers (2024-07-25T08:14:33Z) - A Preliminary Study on Self-Contained Libraries in the NPM Ecosystem [2.221643499902673]
The widespread of libraries within modern software ecosystems creates complex networks of dependencies.
One mitigation strategy involves reducing dependencies; libraries with zero dependencies become to self-contained.
This paper explores the characteristics of self-contained libraries within the NPM ecosystem.
arXiv Detail & Related papers (2024-06-17T09:33:49Z) - ReGAL: Refactoring Programs to Discover Generalizable Abstractions [59.05769810380928]
Generalizable Abstraction Learning (ReGAL) is a method for learning a library of reusable functions via codeization.
We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains.
For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on LOGO, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains.
arXiv Detail & Related papers (2024-01-29T18:45:30Z) - DevEval: Evaluating Code Generation in Practical Software Projects [52.16841274646796]
We propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects.
DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects.
We assess five popular LLMs on DevEval and reveal their actual abilities in code generation.
arXiv Detail & Related papers (2024-01-12T06:51:30Z) - On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository.
We retrieve over 53k potential vulnerable clones from Maven Central.
We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z) - Hexatagging: Projective Dependency Parsing as Tagging [63.5392760743851]
We introduce a novel dependency, the hexatagger, that constructs dependency trees by tagging the words in a sentence with elements from a finite set of possible tags.
Our approach is fully parallelizable at training time, i.e., the structure-building actions needed to build a dependency parse can be predicted in parallel to each other.
We achieve state-of-the-art performance of 96.4 LAS and 97.4 UAS on the Penn Treebank test set.
arXiv Detail & Related papers (2023-06-08T18:02:07Z) - Entity Set Co-Expansion in StackOverflow [49.64523055423687]
Given a few seed entities of a certain type, entity set expansion aims to discover an extensive set of entities that share the same type as the seeds.
We study the entity set co-expansion task in StackOverflow, which extracts Library, OS, Application, and Language entities from StackOverflow question-answer threads.
During the co-expansion process, we use PLMs to derive embeddings of candidate entities for calculating similarities between entities.
arXiv Detail & Related papers (2022-12-05T13:50:35Z) - Please Mind the Root: Decoding Arborescences for Dependency Parsing [67.71280539312536]
We analyze the output of state-of-the-arts on many languages from the Universal Dependency Treebank.
The worst constraint-violation rate we observe is 24%.
arXiv Detail & Related papers (2020-10-06T08:31:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.