AndroLibZoo: A Reliable Dataset of Libraries Based on Software
Dependency Analysis
- URL: http://arxiv.org/abs/2307.12609v3
- Date: Fri, 9 Feb 2024 02:21:42 GMT
- Title: AndroLibZoo: A Reliable Dataset of Libraries Based on Software
Dependency Analysis
- Authors: Jordan Samhi, Tegawend\'e F. Bissyand\'e, Jacques Klein
- Abstract summary: We propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo.
Our dataset, which we make available to the community, contains to date 34 813 libraries and is meant to evolve.
- Score: 6.342380566583581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Android app developers extensively employ code reuse, integrating many
third-party libraries into their apps. While such integration is practical for
developers, it can be challenging for static analyzers to achieve scalability
and precision when libraries account for a large part of the code. As a direct
consequence, it is common practice in the literature to consider developer code
only during static analysis --with the assumption that the sought issues are in
developer code rather than the libraries. However, analysts need to distinguish
between library and developer code. Currently, many static analyses rely on
white lists of libraries. However, these white lists are unreliable,
inaccurate, and largely non-comprehensive.
In this paper, we propose a new approach to address the lack of comprehensive
and automated solutions for the production of accurate and ``always up to date"
sets of libraries. First, we demonstrate the continued need for a white list of
libraries. Second, we propose an automated approach to produce an accurate and
up-to-date set of third-party libraries in the form of a dataset called
AndroLibZoo. Our dataset, which we make available to the community, contains to
date 34 813 libraries and is meant to evolve.
Related papers
- SocialED: A Python Library for Social Event Detection [53.928241775629566]
SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks.
It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media.
SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions.
arXiv Detail & Related papers (2024-12-18T03:37:47Z) - Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.
Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.
Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z) - LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation [40.87656746406113]
We introduce LibEvolutionEval, a study requiring an understanding of library evolution to perform in-line code completion accurately.
We evaluate popular public models and find that public library evolution significantly influences model performance.
We explore mitigation methods by studying how retrieved version-specific library documentation and prompting can improve the model's capability in handling fast-evolving packages.
arXiv Detail & Related papers (2024-11-19T21:52:23Z) - Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - CompSuite: A Dataset of Java Library Upgrade Incompatibility Issues [25.189328666070107]
We introduce CompSuite, a dataset that includes 123 real-world Java client-library pairs where upgrading the library causes an incompatibility issue.
Each incompatibility issue in CompSuite is associated with a test case authored by the developers, which can be used to reproduce the issue.
arXiv Detail & Related papers (2023-05-15T14:26:14Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - An Empirical Study of Library Usage and Dependency in Deep Learning
Frameworks [12.624032509149869]
pytorch, Caffe, and Scikit-learn are the most frequent combination in 18% and 14% of the projects.
The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files.
arXiv Detail & Related papers (2022-11-28T19:31:56Z) - Code Librarian: A Software Package Recommendation System [65.05559087332347]
We present a recommendation engine called Librarian for open source libraries.
A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code.
arXiv Detail & Related papers (2022-10-11T12:30:05Z) - Repro: An Open-Source Library for Improving the Reproducibility and
Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code.
It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z) - Req2Lib: A Semantic Neural Model for Software Library Recommendation [8.713783358744166]
We propose a novel neural approach called Req2Lib which recommends libraries given descriptions of the project requirement.
We use a Sequence-to-Sequence model to learn the library linked-usage information and semantic information of requirement descriptions in natural language.
Our preliminary evaluation demonstrates that Req2Lib can recommend libraries accurately.
arXiv Detail & Related papers (2020-05-24T14:37:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.