AndroLibZoo: A Reliable Dataset of Libraries Based on Software
Dependency Analysis
- URL: http://arxiv.org/abs/2307.12609v3
- Date: Fri, 9 Feb 2024 02:21:42 GMT
- Title: AndroLibZoo: A Reliable Dataset of Libraries Based on Software
Dependency Analysis
- Authors: Jordan Samhi, Tegawend\'e F. Bissyand\'e, Jacques Klein
- Abstract summary: We propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo.
Our dataset, which we make available to the community, contains to date 34 813 libraries and is meant to evolve.
- Score: 6.342380566583581
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Android app developers extensively employ code reuse, integrating many
third-party libraries into their apps. While such integration is practical for
developers, it can be challenging for static analyzers to achieve scalability
and precision when libraries account for a large part of the code. As a direct
consequence, it is common practice in the literature to consider developer code
only during static analysis --with the assumption that the sought issues are in
developer code rather than the libraries. However, analysts need to distinguish
between library and developer code. Currently, many static analyses rely on
white lists of libraries. However, these white lists are unreliable,
inaccurate, and largely non-comprehensive.
In this paper, we propose a new approach to address the lack of comprehensive
and automated solutions for the production of accurate and ``always up to date"
sets of libraries. First, we demonstrate the continued need for a white list of
libraries. Second, we propose an automated approach to produce an accurate and
up-to-date set of third-party libraries in the form of a dataset called
AndroLibZoo. Our dataset, which we make available to the community, contains to
date 34 813 libraries and is meant to evolve.
Related papers
- Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework.
Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z) - Lightweight Syntactic API Usage Analysis with UCov [0.0]
We present a novel conceptual framework designed to assist library maintainers in understanding the interactions allowed by their APIs.
These customizable models enable library maintainers to improve their design ahead of release, reducing friction during evolution.
We implement these models for Java libraries in a new tool UCov and demonstrate its capabilities on three libraries exhibiting diverse styles of interaction.
arXiv Detail & Related papers (2024-02-19T10:33:41Z) - LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code.
LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch.
We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z) - Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries.
We propose a novel framework that emulates the process of programmers writing private code.
We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z) - CompSuite: A Dataset of Java Library Upgrade Incompatibility Issues [25.189328666070107]
We introduce CompSuite, a dataset that includes 123 real-world Java client-library pairs where upgrading the library causes an incompatibility issue.
Each incompatibility issue in CompSuite is associated with a test case authored by the developers, which can be used to reproduce the issue.
arXiv Detail & Related papers (2023-05-15T14:26:14Z) - SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks.
It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches.
We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z) - An Empirical Study of Library Usage and Dependency in Deep Learning
Frameworks [12.624032509149869]
pytorch, Caffe, and Scikit-learn are the most frequent combination in 18% and 14% of the projects.
The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files.
arXiv Detail & Related papers (2022-11-28T19:31:56Z) - Code Librarian: A Software Package Recommendation System [65.05559087332347]
We present a recommendation engine called Librarian for open source libraries.
A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code.
arXiv Detail & Related papers (2022-10-11T12:30:05Z) - Repro: An Open-Source Library for Improving the Reproducibility and
Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code.
It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z) - Req2Lib: A Semantic Neural Model for Software Library Recommendation [8.713783358744166]
We propose a novel neural approach called Req2Lib which recommends libraries given descriptions of the project requirement.
We use a Sequence-to-Sequence model to learn the library linked-usage information and semantic information of requirement descriptions in natural language.
Our preliminary evaluation demonstrates that Req2Lib can recommend libraries accurately.
arXiv Detail & Related papers (2020-05-24T14:37:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.