Related papers: AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis

AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis

URL: http://arxiv.org/abs/2307.12609v3
Date: Fri, 9 Feb 2024 02:21:42 GMT
Title: AndroLibZoo: A Reliable Dataset of Libraries Based on Software Dependency Analysis
Authors: Jordan Samhi, Tegawend\'e F. Bissyand\'e, Jacques Klein
Abstract summary: We propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo. Our dataset, which we make available to the community, contains to date 34 813 libraries and is meant to evolve.
Score: 6.342380566583581
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Android app developers extensively employ code reuse, integrating many third-party libraries into their apps. While such integration is practical for developers, it can be challenging for static analyzers to achieve scalability and precision when libraries account for a large part of the code. As a direct consequence, it is common practice in the literature to consider developer code only during static analysis --with the assumption that the sought issues are in developer code rather than the libraries. However, analysts need to distinguish between library and developer code. Currently, many static analyses rely on white lists of libraries. However, these white lists are unreliable, inaccurate, and largely non-comprehensive. In this paper, we propose a new approach to address the lack of comprehensive and automated solutions for the production of accurate and ``always up to date" sets of libraries. First, we demonstrate the continued need for a white list of libraries. Second, we propose an automated approach to produce an accurate and up-to-date set of third-party libraries in the form of a dataset called AndroLibZoo. Our dataset, which we make available to the community, contains to date 34 813 libraries and is meant to evolve.

Related papers

How Robust are LLM-Generated Library Imports? An Empirical Study using Stack Overflow [3.076436880934678]
We conduct an empirical study of six Large Language Models (LLMs)<n>We analyze the types of libraries they import, the characteristics of those libraries, and the extent to which the recommendations are usable out of the box.<n>Our results show that LLMs predominantly favour third-party libraries over standard ones, and often recommend mature, popular, and permissively licensed dependencies.
arXiv Detail & Related papers (2025-07-14T21:35:29Z)
SocialED: A Python Library for Social Event Detection [53.928241775629566]
SocialED is a comprehensive, open-source Python library designed to support social event detection (SED) tasks. It provides a unified API with detailed documentation, offering researchers and practitioners a complete solution for event detection in social media. SocialED supports a wide range of preprocessing techniques, such as graph construction and tokenization, and includes standardized interfaces for training models and making predictions.
arXiv Detail & Related papers (2024-12-18T03:37:47Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation [40.87656746406113]
We introduce LibEvolutionEval, a study requiring an understanding of library evolution to perform in-line code completion accurately. We evaluate popular public models and find that public library evolution significantly influences model performance. We explore mitigation methods by studying how retrieved version-specific library documentation and prompting can improve the model's capability in handling fast-evolving packages.
arXiv Detail & Related papers (2024-11-19T21:52:23Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Lightweight Syntactic API Usage Analysis with UCov [0.0]
We present a novel conceptual framework designed to assist library maintainers in understanding the interactions allowed by their APIs. These customizable models enable library maintainers to improve their design ahead of release, reducing friction during evolution. We implement these models for Java libraries in a new tool UCov and demonstrate its capabilities on three libraries exhibiting diverse styles of interaction.
arXiv Detail & Related papers (2024-02-19T10:33:41Z)
LILO: Learning Interpretable Libraries by Compressing and Documenting Code [71.55208585024198]
We introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated from Stitch. We find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions.
arXiv Detail & Related papers (2023-10-30T17:55:02Z)
Private-Library-Oriented Code Generation with Large Language Models [52.73999698194344]
This paper focuses on utilizing large language models (LLMs) for code generation in private libraries. We propose a novel framework that emulates the process of programmers writing private code. We create four private library benchmarks, including TorchDataEval, TorchDataComplexEval, MonkeyEval, and BeatNumEval.
arXiv Detail & Related papers (2023-07-28T07:43:13Z)
CompSuite: A Dataset of Java Library Upgrade Incompatibility Issues [25.189328666070107]
We introduce CompSuite, a dataset that includes 123 real-world Java client-library pairs where upgrading the library causes an incompatibility issue. Each incompatibility issue in CompSuite is associated with a test case authored by the developers, which can be used to reproduce the issue.
arXiv Detail & Related papers (2023-05-15T14:26:14Z)
SequeL: A Continual Learning Library in PyTorch and JAX [50.33956216274694]
SequeL is a library for Continual Learning that supports both PyTorch and JAX frameworks. It provides a unified interface for a wide range of Continual Learning algorithms, including regularization-based approaches, replay-based approaches, and hybrid approaches. We release SequeL as an open-source library, enabling researchers and developers to easily experiment and extend the library for their own purposes.
arXiv Detail & Related papers (2023-04-21T10:00:22Z)
An Empirical Study of Library Usage and Dependency in Deep Learning Frameworks [12.624032509149869]
pytorch, Caffe, and Scikit-learn are the most frequent combination in 18% and 14% of the projects. The developer uses two or three dl libraries in the same projects and tends to use different multiple dl libraries in both the same function and the same files.
arXiv Detail & Related papers (2022-11-28T19:31:56Z)
Code Librarian: A Software Package Recommendation System [65.05559087332347]
We present a recommendation engine called Librarian for open source libraries. A candidate library package is recommended for a given context if: 1) it has been frequently used with the imported libraries in the program; 2) it has similar functionality to the imported libraries in the program; 3) it has similar functionality to the developer's implementation, and 4) it can be used efficiently in the context of the provided code.
arXiv Detail & Related papers (2022-10-11T12:30:05Z)
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code [74.28810048824519]
Repro is an open-source library which aims at improving the usability of research code. It provides a lightweight Python API for running software released by researchers within Docker containers.
arXiv Detail & Related papers (2022-04-29T01:54:54Z)
Req2Lib: A Semantic Neural Model for Software Library Recommendation [8.713783358744166]
We propose a novel neural approach called Req2Lib which recommends libraries given descriptions of the project requirement. We use a Sequence-to-Sequence model to learn the library linked-usage information and semantic information of requirement descriptions in natural language. Our preliminary evaluation demonstrates that Req2Lib can recommend libraries accurately.
arXiv Detail & Related papers (2020-05-24T14:37:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.