Related papers: LLMs as Packagers of HPC Software

LLMs as Packagers of HPC Software

URL: http://arxiv.org/abs/2511.05626v1
Date: Fri, 07 Nov 2025 00:06:51 GMT
Title: LLMs as Packagers of HPC Software
Authors: Caetano Melone, Daniel Nichols, Konstantinos Parasyris, Todd Gamblin, Harshitha Menon,
Abstract summary: Tools such as Spack automate dependency resolution and environment management, but their effectiveness relies on manually written build recipes.<n>We introduce SpackIt, an end-to-end framework that combines repository analysis, retrieval of relevant examples, and iterative refinement through diagnostic feedback.<n>Our results show that SpackIt increases installation success from 20% in a zero-shot setting to over 80% in its best configuration.
Score: 2.195636219953539
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High performance computing (HPC) software ecosystems are inherently heterogeneous, comprising scientific applications that depend on hundreds of external packages, each with distinct build systems, options, and dependency constraints. Tools such as Spack automate dependency resolution and environment management, but their effectiveness relies on manually written build recipes. As these ecosystems grow, maintaining existing specifications and creating new ones becomes increasingly labor-intensive. While large language models (LLMs) have shown promise in code generation, automatically producing correct and maintainable Spack recipes remains a significant challenge. We present a systematic analysis of how LLMs and context-augmentation methods can assist in the generation of Spack recipes. To this end, we introduce SpackIt, an end-to-end framework that combines repository analysis, retrieval of relevant examples, and iterative refinement through diagnostic feedback. We apply SpackIt to a representative subset of 308 open-source HPC packages to assess its effectiveness and limitations. Our results show that SpackIt increases installation success from 20% in a zero-shot setting to over 80% in its best configuration, demonstrating the value of retrieval and structured feedback for reliable package synthesis.

Related papers

KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization [3.0268242725574215]
KAPSO is a modular framework for autonomous program synthesis and optimization.<n>It iteratively performs ideation, code synthesis and editing, execution, evaluation, and learning to improve a runnable artifact.
arXiv Detail & Related papers (2026-01-29T10:40:54Z)
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z)
LeMat-Synth: a multi-modal toolbox to curate broad synthesis procedure databases from scientific literature [60.879220305044726]
We propose a multi-modal toolbox that employs large language models (LLMs) and vision language models (VLMs) to automatically extract and organize synthesis procedures and performance data.<n>We curated 81k open-access papers, yielding LeMat- Synth (v 1.0): a dataset containing synthesis procedures spanning 35 synthesis methods and 16 material classes.<n>We release a modular, open-source library designed to support community-driven extension to new corpora and synthesis domains.
arXiv Detail & Related papers (2025-10-28T17:58:18Z)
SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents [49.73885480071402]
We introduce SWE-PolyBench, a new benchmark for repository-level, execution-based evaluation of coding agents.<n>SWE-PolyBench contains 2110 instances from 21 repositories and includes tasks in Java (165), JavaScript (1017), TypeScript (729) and Python (199), covering bug fixes, feature additions, and code.<n>Our experiments show that current agents exhibit uneven performances across languages and struggle with complex problems while showing higher performance on simpler tasks.
arXiv Detail & Related papers (2025-04-11T17:08:02Z)
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.<n>SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.<n>We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving competitive performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z)
EpiCoder: Encompassing Diversity and Complexity in Code Generation [66.43738008739555]
Existing methods for code generation use code snippets as seed data.<n>We introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features.<n>Our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios.
arXiv Detail & Related papers (2025-01-08T18:58:15Z)
OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions [58.46747176834132]
We present OmniEvalKit, a novel benchmarking toolbox designed to evaluate Large Language Models (LLMs)<n>Unlike existing benchmarks that often focus on a single aspect, OmniEvalKit provides a modular, lightweight, and automated evaluation system.<n>It is structured with a modular architecture comprising a Static Builder and Dynamic Data Flow, promoting the seamless integration of new models and datasets.
arXiv Detail & Related papers (2024-12-09T17:39:43Z)
A Machine Learning-Based Approach For Detecting Malicious PyPI Packages [4.311626046942916]
In modern software development, the use of external libraries and packages is increasingly prevalent.<n>This reliance on reusing code introduces serious risks for deployed software in the form of malicious packages.<n>We propose a data-driven approach that uses machine learning and static analysis to examine the package's metadata, code, files, and textual characteristics.
arXiv Detail & Related papers (2024-12-06T18:49:06Z)
PVAC: Package Version Activity Categorizer, Leveraging Semantic Versioning in a Heterogeneous System [0.0]
This research aims to introduce a systematic method and a prototype tool for assessing version activity within heterogeneous package manager ecosystems.<n>We developed a Package Version Activity Categorizer (PVAC) that consists of three components.<n>PVAC parses semantic versioning details from diverse package version strings, enabling consistent categorization and quantitative scoring of version changes.
arXiv Detail & Related papers (2024-09-06T19:58:20Z)
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs [3.515912713354746]
Package hallucinations arise from fact-conflicting errors when generating code using Large Language Models.<n>This paper conducts a rigorous and comprehensive evaluation of package hallucinations across different programming languages.<n>We show that the average percentage of hallucinated packages is at least 5.2% for commercial models and 21.7% for open-source models.
arXiv Detail & Related papers (2024-06-12T03:29:06Z)
Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.<n>Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.<n>In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z)
Analyzing the Evolution of Inter-package Dependencies in Operating Systems: A Case Study of Ubuntu [7.76541950830141]
An Operating System (OS) combines multiple interdependent software packages, which usually have their own independently developed architectures. For an evolutionary effort, designers/developers of OS can greatly benefit from fully understanding the system-wide dependency focused on individual files. We propose a framework, DepEx, aimed at discovering the detailed package relations at the level of individual binary files.
arXiv Detail & Related papers (2023-07-10T10:12:21Z)
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes [54.13559879916708]
EVAPORATE is a prototype system powered by large language models (LLMs)<n>Code synthesis is cheap, but far less accurate than directly processing each document with the LLM.<n>We propose an extended code implementation, EVAPORATE-CODE+, which achieves better quality than direct extraction.
arXiv Detail & Related papers (2023-04-19T06:00:26Z)
An Empirical Analysis of the R Package Ecosystem [0.0]
We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades. We find that the historical growth of the ecosystem has been robust under all measures.
arXiv Detail & Related papers (2021-02-19T12:55:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.