Related papers: Local Software Buildability across Java Versions (Registered Report)

Local Software Buildability across Java Versions (Registered Report)

URL: http://arxiv.org/abs/2408.11544v1
Date: Wed, 21 Aug 2024 11:51:00 GMT
Title: Local Software Buildability across Java Versions (Registered Report)
Authors: Matúš Sulír, Jaroslav Porubän, Sergej Chodarev,
Abstract summary: We will try to automatically build every project in containers with Java versions 6 to 23 installed. Success or failure will be determined by exit codes, and standard output and error streams will be saved.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Context: Downloading the source code of open-source Java projects and building them on a local computer using Maven, Gradle, or Ant is a common activity performed by researchers and practitioners. Multiple studies so far found that about 40-60% of such attempts fail. Our experience from the last years suggests that the proportion of failed builds rises continually even further. Objective: First, we would like to empirically confirm our hypothesis that with increasing Java versions, the percentage of build-failing projects tends to grow. Next, nine supplementary research questions are proposed, related mainly to the proportions of failing projects, universal version compatibility, failures under specific JDK versions, success rates of build tools, wrappers, and failure reasons. Method: We will sample 2,500 random pure-Java projects having a build configuration file and fulfilling basic quality criteria from GitHub. We will try to automatically build every project in containers with Java versions 6 to 23 installed. Success or failure will be determined by exit codes, and standard output and error streams will be saved. A majority of the analysis will be performed automatically using reproducible scripts.

Related papers

Canonicalization for Unreproducible Builds in Java [11.367562045401554]
We introduce a conceptual framework for reproducible builds, analyze a large dataset from Reproducible Central, and develop a novel taxonomy of six root causes of unreproducibility. We present Chains-Rebuild, a tool that raises success from 9.48% to 26.89% on 12,283 unreproducible artifacts.
arXiv Detail & Related papers (2025-04-30T14:17:54Z)
EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain. Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets. To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases. The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms [48.43237545197775]
Unit test generation has become a promising and important use case of LLMs. ProjectTest is a project-level benchmark for unit test generation covering Python, Java, and JavaScript.
arXiv Detail & Related papers (2025-02-10T15:24:30Z)
Long Code Arena: a Set of Benchmarks for Long-Context Code Models [75.70507534322336]
Long Code Arena is a suite of six benchmarks for code processing tasks that require project-wide context. These tasks cover different aspects of code processing: library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization. For each task, we provide a manually verified dataset for testing, an evaluation suite, and open-source baseline solutions.
arXiv Detail & Related papers (2024-06-17T14:58:29Z)
Detecting Build Dependency Errors in Incremental Builds [13.823208277774572]
We propose EChecker to detect build dependency errors in the context of incremental builds. EChecker automatically updates actual build dependencies by inferring them from C/C++ pre-processor directives and Makefile changes from new commits. EChecker increases the build dependency error detection efficiency by an average of 85.14 times.
arXiv Detail & Related papers (2024-04-20T07:01:11Z)
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models [123.66104233291065]
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address. JailbreakBench is an open-sourced benchmark with the following components.
arXiv Detail & Related papers (2024-03-28T02:44:02Z)
Java JIT Testing with Template Extraction [7.714591709931207]
LeJit is a template-based framework for testing Java just-in-time (JIT) compilers. We have successfully used LeJit to test a range of popular Java JIT compilers.
arXiv Detail & Related papers (2024-03-17T17:39:27Z)
Observation-based unit test generation at Meta [52.4716552057909]
TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. TestGen has landed 518 tests into production, which have been executed 9,617,349 times in continuous integration, finding 5,702 faults. Our evaluation reveals that, when carving its observations from 4,361 reliable end-to-end tests, TestGen was able to generate tests for at least 86% of the classes covered by end-to-end tests.
arXiv Detail & Related papers (2024-02-09T00:34:39Z)
DevEval: Evaluating Code Generation in Practical Software Projects [52.16841274646796]
We propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects. DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects. We assess five popular LLMs on DevEval and reveal their actual abilities in code generation.
arXiv Detail & Related papers (2024-01-12T06:51:30Z)
On the Security Blind Spots of Software Composition Analysis [46.1389163921338]
We present a novel approach to detect vulnerable clones in the Maven repository. We retrieve over 53k potential vulnerable clones from Maven Central. We detect 727 confirmed vulnerable clones and synthesize a testable proof-of-vulnerability project for each of those.
arXiv Detail & Related papers (2023-06-08T20:14:46Z)
A Language Model of Java Methods with Train/Test Deduplication [5.529795221640365]
This tool demonstration presents a research toolkit for a language model of Java source code. The target audience includes researchers studying problems at the granularity level of subroutines, statements, or variables in Java.
arXiv Detail & Related papers (2023-05-15T00:22:02Z)
Automatic Specialization of Third-Party Java Dependencies [3.7973152331947815]
Large-scale code reuse significantly reduces both development costs and time. Massive share of third-party code in software projects poses new challenges, especially in terms of maintenance and security. We propose a novel technique to specialize dependencies of Java projects, based on their actual usage.
arXiv Detail & Related papers (2023-02-16T15:37:49Z)
D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis [55.15995704119158]
We propose D2A, a differential analysis based approach to label issues reported by static analysis tools. We use D2A to generate a large labeled dataset to train models for vulnerability identification.
arXiv Detail & Related papers (2021-02-16T07:46:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.