Diagnosing and Resolving Android Applications Building Issues: An Empirical Study
- URL: http://arxiv.org/abs/2511.06186v1
- Date: Sun, 09 Nov 2025 02:01:14 GMT
- Title: Diagnosing and Resolving Android Applications Building Issues: An Empirical Study
- Authors: Lakshmi Priya Bodepudi, Yutong Zhao, Ming Quan Fu, Yuanyuan Wu, Sen He, Yu Zhao,
- Abstract summary: This study conducts an empirical analysis of 200 open-source Android projects written in Java and Kotlin to diagnose and resolve build failures.<n>We identified four primary types of build errors: environment issues, dependency and Gradle task errors, configuration problems, and syntax/API incompatibilities.
- Score: 4.9727667541752085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Building Android applications reliably remains a persistent challenge due to complex dependencies, diverse configurations, and the rapid evolution of the Android ecosystem. This study conducts an empirical analysis of 200 open-source Android projects written in Java and Kotlin to diagnose and resolve build failures. Through a five-phase process encompassing data collection, build execution, failure classification, repair strategy design, and LLM-assisted evaluation, we identified four primary types of build errors: environment issues, dependency and Gradle task errors, configuration problems, and syntax/API incompatibilities. Among the 135 projects that initially failed to build, our diagnostic and repair strategy enabled developers to resolve 102 cases (75.56%), significantly reducing troubleshooting effort. We further examined the potential of Large Language Models, such as GPT-5, to assist in error diagnosis, achieving a 53.3% success rate in suggesting viable fixes. An analysis of project attributes revealed that build success is influenced by programming language, project age, and app size. These findings provide practical insights into improving Android build reliability and advancing AI-assisted software maintenance.
Related papers
- PhantomRun: Auto Repair of Compilation Errors in Embedded Open Source Software [2.64399132991614]
We study four major open-source embedded system projects, spanning over 4000 build failures from the project's CI runs.<n>We find that hardware dependencies account for the majority of compilation failures, followed by syntax errors and build-script issues.<n>We present PhantomRun, an automated framework that leverages large language models (LLMs) to generate and validate fixes for CI compilation failures.
arXiv Detail & Related papers (2026-02-23T19:13:22Z) - Automating Android Build Repair: Bridging the Reasoning-Execution Gap in LLM Agents with Domain-Specific Tools [11.19523991999335]
We introduce AndroidBuildBench, a benchmark of 1,019 build failures curated from the commit histories of 43 open-source Android projects.<n>Each problem is paired with a verified solution from a subsequent commit, ensuring that fixes are feasible.<n>We demonstrate the effectiveness of a strategy we term Tool Bridging, which replaces general-purpose shell commands with domain-aware abstractions.
arXiv Detail & Related papers (2025-10-09T01:33:25Z) - Lessons from a Big-Bang Integration: Challenges in Edge Computing and Machine Learning [52.86213078016168]
The project faced critical setbacks due to a big-bang integration approach.<n>The study identifies technical and organisational barriers, including poor communication.<n>It also considers psychological factors such as a bias toward fully developed components over mockups.
arXiv Detail & Related papers (2025-07-23T07:16:45Z) - Hierarchical Knowledge Injection for Improving LLM-based Program Repair [5.81561797043823]
In real-world projects, developers often rely on broader repository and project-level context beyond the local code to resolve such bugs.<n>We propose a layered knowledge injection framework that incrementally augments LLMs with structured context.<n>We evaluate this framework on a dataset of 314 bugs from BugsInPy, and analyze fix rates across six bug types.
arXiv Detail & Related papers (2025-06-30T16:19:38Z) - CXXCrafter: An LLM-Based Agent for Automated C/C++ Open Source Software Building [14.687126587793028]
C/C++ projects often proves to be difficult in practice, hindering the progress of downstream applications.<n>We develop an automated build system called CXXCrafter to address the challenges, such as dependency resolution.<n>Our evaluation on open-source software demonstrates that CXXCrafter achieves a success rate of 78% in project building.
arXiv Detail & Related papers (2025-05-27T11:54:56Z) - CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.<n>The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z) - ProjectTest: A Project-level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms [48.43237545197775]
Unit test generation has become a promising and important use case of LLMs.<n>ProjectTest is a project-level benchmark for unit test generation covering Python, Java, and JavaScript.
arXiv Detail & Related papers (2025-02-10T15:24:30Z) - SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories [55.161075901665946]
Super aims to capture the realistic challenges faced by researchers working with Machine Learning (ML) and Natural Language Processing (NLP) research repositories.
Our benchmark comprises three distinct problem sets: 45 end-to-end problems with annotated expert solutions, 152 sub problems derived from the expert set that focus on specific challenges, and 602 automatically generated problems for larger-scale development.
We show that state-of-the-art approaches struggle to solve these problems with the best model (GPT-4o) solving only 16.3% of the end-to-end set, and 46.1% of the scenarios.
arXiv Detail & Related papers (2024-09-11T17:37:48Z) - Leveraging Large Language Models for Efficient Failure Analysis in Game Development [47.618236610219554]
This paper proposes a new approach to automatically identify which change in the code caused a test to fail.
The method leverages Large Language Models (LLMs) to associate error messages with the corresponding code changes causing the failure.
Our approach reaches an accuracy of 71% in our newly created dataset, which comprises issues reported by developers at EA over a period of one year.
arXiv Detail & Related papers (2024-06-11T09:21:50Z) - LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android
Apps with Pre-trained Large Language Models [34.23051590289707]
We introduce the LLM-CompDroid framework, which combines the strengths of LLMs and traditional tools for bug resolution.
Our experimental results demonstrate a significant enhancement in bug resolution performance by LLM-CompDroid.
This innovative approach holds promise for advancing the reliability and robustness of Android applications.
arXiv Detail & Related papers (2024-02-23T03:51:16Z) - Towards a Fair Comparison and Realistic Design and Evaluation Framework
of Android Malware Detectors [63.75363908696257]
We analyze 10 influential research works on Android malware detection using a common evaluation framework.
We identify five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models.
We conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results.
arXiv Detail & Related papers (2022-05-25T08:28:08Z) - Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning.
In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.