Related papers: PCREQ: Automated Inference of Compatible Requirements for Python Third-party Library Upgrades

PCREQ: Automated Inference of Compatible Requirements for Python Third-party Library Upgrades

URL: http://arxiv.org/abs/2508.02023v1
Date: Mon, 04 Aug 2025 03:34:30 GMT
Title: PCREQ: Automated Inference of Compatible Requirements for Python Third-party Library Upgrades
Authors: Huashan Lei, Guanping Xiao, Yepang Liu, Zheng Zheng,
Abstract summary: Python third-party libraries (TPLs) are essential in modern software development, but upgrades often cause compatibility issues, leading to system failures.<n>Existing tools mainly detect dependency conflicts but overlook code-level incompatibilities.<n>We propose PCREQ, the first approach to automatically infer compatible requirements by combining version and code compatibility analysis.
Score: 5.857193811761703
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Python third-party libraries (TPLs) are essential in modern software development, but upgrades often cause compatibility issues, leading to system failures. These issues fall into two categories: version compatibility issues (VCIs) and code compatibility issues (CCIs). Existing tools mainly detect dependency conflicts but overlook code-level incompatibilities, with no solution fully automating the inference of compatible versions for both VCIs and CCIs. To fill this gap, we propose PCREQ, the first approach to automatically infer compatible requirements by combining version and code compatibility analysis. PCREQ integrates six modules: knowledge acquisition, version compatibility assessment, invoked APIs and modules extraction, code compatibility assessment, version change, and missing TPL completion. PCREQ collects candidate versions, checks for conflicts, identifies API usage, evaluates code compatibility, and iteratively adjusts versions to generate a compatible requirements.txt with a detailed repair report. To evaluate PCREQ, we construct REQBench, a large-scale benchmark with 2,095 upgrade test cases (including 406 unsolvable by pip). Results show PCREQ achieves a 94.03% inference success rate, outperforming PyEGo (37.02%), ReadPyE (37.16%), and LLM-based approaches (GPT-4o, DeepSeek V3/R1) by 18-20%. PCREQ processes each case from REQBench in 60.79s on average, demonstrating practical efficiency. PCREQ significantly reduces manual effort in troubleshooting upgrades, advancing Python dependency maintenance automation.

Related papers

SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z)
SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints [59.645885492637845]
SOPBench is an evaluation pipeline that transforms each service-specific SOP code program into a directed graph of executable functions.<n>Our approach transforms each service-specific SOP code program into a directed graph of executable functions and requires agents to call these functions based on natural language SOP descriptions.<n>We evaluate 18 leading models, and results show the task is challenging even for top-tier models.
arXiv Detail & Related papers (2025-03-11T17:53:02Z)
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification [71.34070740261072]
This paper presents a benchmark, CLOVER, to evaluate models' capabilities in generating and completing test cases.<n>The benchmark is containerized for code execution across tasks, and we will release the code, data, and construction methodologies.
arXiv Detail & Related papers (2025-02-12T21:42:56Z)
Raiders of the Lost Dependency: Fixing Dependency Conflicts in Python using LLMs [10.559292676550319]
Python developers must manually identify and resolve environment dependencies and version constraints of third-party modules and Python interpreters.<n>Traditional approaches face limitations due to the variety of dependency error types, large sets of possible module versions, and conflicts among.<n>This study explores the potential of using large language models (LLMs) to automatically fix dependency issues in Python programs.
arXiv Detail & Related papers (2025-01-27T16:45:34Z)
PCART: Automated Repair of Python API Parameter Compatibility Issues [11.36053416670063]
Python third-party libraries play a critical role, especially in fields like deep learning and scientific computing.<n>API parameters in these libraries often change during evolution, leading to compatibility issues for client applications reliant on specific versions.<n>No tool can automatically detect and repair Python API parameter compatibility issues.<n>PCART is the first solution to fully automate the process of API extraction, code instrumentation, API mapping establishment, compatibility assessment, repair, and validation.
arXiv Detail & Related papers (2024-06-06T08:15:12Z)
Less is More? An Empirical Study on Configuration Issues in Python PyPI Ecosystem [38.44692482370243]
Python is widely used in the open-source community, largely owing to the extensive support from diverse third-party libraries. Third-party libraries can potentially lead to conflicts in dependencies, prompting researchers to develop dependency conflict detectors. endeavors have been made to automatically infer dependencies.
arXiv Detail & Related papers (2023-10-19T09:07:51Z)
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? [80.52201658231895]
SWE-bench is an evaluation framework consisting of $2,294$ software engineering problems drawn from real GitHub issues and corresponding pull requests across $12$ popular Python repositories. We show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues.
arXiv Detail & Related papers (2023-10-10T16:47:29Z)
Knowledge-Based Version Incompatibility Detection for Deep Learning [32.116361254082086]
We propose to leverage the abundant discussions of DL version issues from Stack Overflow to facilitate version incompatibility detection. We reformulate the problem of knowledge extraction as a Question-Answering (QA) problem and use a pre-trained QA model to extract version compatibility knowledge.
arXiv Detail & Related papers (2023-08-25T09:53:26Z)
UQpy v4.1: Uncertainty Quantification with Python [4.6405927770229]
This paper presents the latest improvements introduced in Version 4 of the UQpy, Uncertainty Quantification with Python, library. In the latest version, the code was restructured to conform with the latest Python coding conventions. To improve the robustness of UQpy, software engineering best practices were adopted.
arXiv Detail & Related papers (2023-05-16T16:11:04Z)
Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models [65.51757376525798]
Latte is a Python library for evaluation of latent-based generative models. Latte is compatible with both PyTorch and/Keras, and provides both functional and modular APIs.
arXiv Detail & Related papers (2021-12-20T16:00:28Z)
Compatibility-aware Heterogeneous Visual Search [93.90831195353333]
Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images. We address two forms of compatibility: One enforced by modifying the parameters of each model that computes the embeddings, the other by modifying the architectures that compute the embeddings. Compared to ordinary (homogeneous) visual search using the largest embedding model (paragon), CMP-NAS achieves 80-fold and 23-fold cost reduction.
arXiv Detail & Related papers (2021-05-13T02:30:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.