Scalable Delivery of Scalable Libraries and Tools: How ECP Delivered a
Software Ecosystem for Exascale and Beyond
- URL: http://arxiv.org/abs/2311.06995v1
- Date: Mon, 13 Nov 2023 00:30:43 GMT
- Title: Scalable Delivery of Scalable Libraries and Tools: How ECP Delivered a
Software Ecosystem for Exascale and Beyond
- Authors: Michael A. Heroux
- Abstract summary: The Exascale Computing Project (ECP) was one of the largest open-source scientific software development projects ever.
It supported approximately 1,000 staff from US Department of Energy laboratories, and university and industry partners.
About 250 staff contributed to 70 scientific libraries and tools to support applications on multiple exascale computing systems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Exascale Computing Project (ECP) was one of the largest open-source
scientific software development projects ever. It supported approximately 1,000
staff from US Department of Energy laboratories, and university and industry
partners. About 250 staff contributed to 70 scientific libraries and tools to
support applications on multiple exascale computing systems that were also
under development.
Funded as a construction project, ECP adopted an earned-value management
system, based on milestones. and a key performance parameter system based, in
part, on integrations. With accelerated delivery schedules and significant
project risk, we also emphasized software quality using community policies,
automated testing, and continuous integration. Software Development Kit teams
provided cross-team collaboration. Products were delivered via E4S, a curated
portfolio of libraries and tools.
In this paper, we discuss the organizational and management elements that
enabled the efficient and effective delivery of ECP libraries and tools,
lessons learned and next steps.
Related papers
- Project For Advancement of Software Usability in Materials Science [0.0815557531820863]
ISSP has been carrying out a software development project named the Project for Advancement of Software Usability in Materials Science (PASUMS)"<n>Various open-source software programs have been developed/advanced, including ab initio calculations, effective model solvers, and software for machine learning.
arXiv Detail & Related papers (2025-05-23T21:35:38Z) - An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework [49.633199780510864]
This work proposes a multi-agent autonomous mechatronics design framework, integrating expertise across mechanical design, optimization, electronics, and software engineering.
operating primarily through a language-driven workflow, the framework incorporates structured human feedback to ensure robust performance under real-world constraints.
A fully functional autonomous vessel was developed with optimized propulsion, cost-effective electronics, and advanced control.
arXiv Detail & Related papers (2025-04-20T16:57:45Z) - EmbedGenius: Towards Automated Software Development for Generic Embedded IoT Systems [11.524778651869044]
This paper introduces EmbedGenius, the first fully automated software development platform for general-purpose embedded IoT systems.
The key idea is to leverage the reasoning ability of Large Language Models (LLMs) and embedded system expertise to automate the hardware-in-the-loop development process.
We evaluate EmbedGenius's performance across 71 modules and four mainstream embedded development platforms with over 350 IoT tasks.
arXiv Detail & Related papers (2024-12-12T08:34:12Z) - Exascale Workflow Applications and Middleware: An ExaWorks Retrospective [3.4423220997316593]
We present the ExaWorks project, which addresses the challenges of coordinating and deploying heterogeneous software components on diverse and massive platforms.
We developed a workflow Software Development Toolkit (SDK), a job management abstraction API, and PSI/J, a minimal interface for submitting and monitoring jobs.
We discuss how our project is working with the workflow community, large computing facilities, and HPC platform vendors to address the requirements of sustainably at the exascale.
arXiv Detail & Related papers (2024-11-16T00:10:53Z) - ExaWorks Software Development Kit: A Robust and Scalable Collection of Interoperable Workflow Technologies [3.1805622006446397]
Heterogeneous scientific discovery increasingly requires executing on high-performance computing platforms.
We contributed to addressing this issue by developing the ExaWorks Software Development Kit (SDK)
The SDK is a collection of workflow technologies engineered following current best practices and specifically designed to work on HPC platforms.
arXiv Detail & Related papers (2024-07-23T17:00:09Z) - Estimating the Energy Footprint of Software Systems: a Primer [56.200335252600354]
quantifying the energy footprint of a software system is one of the most basic activities.
This document aims to be a starting point for researchers who want to begin conducting work in this area.
arXiv Detail & Related papers (2024-07-16T11:21:30Z) - A Scalable Clustered Architecture for Cyber-Physical Systems [0.0]
Cyber-Physical Systems (CPS) play a vital role in the operation of interconnected systems.
CPS integrates physical and software components capable of sensing, monitoring, and controlling physical assets and processes.
The development of this project aims to contribute to the design and implementation of a solution to the CPS challenges.
arXiv Detail & Related papers (2024-07-08T13:37:00Z) - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks.
SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs.
We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z) - Exploring and Characterizing Large Language Models For Embedded System
Development and Debugging [10.967443876391611]
Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems has not been studied.
We develop an open source framework to evaluate leading LLMs to assess their capabilities and limitations for embedded system development.
We leverage this finding to study how human programmers interact with these tools, and develop an human-AI based software engineering workflow for building embedded systems.
arXiv Detail & Related papers (2023-07-07T20:14:22Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - Tangelo: An Open-source Python Package for End-to-end Chemistry
Workflows on Quantum Computers [85.21205677945196]
Tangelo is an open-source Python software package for the development of end-to-end chemistry on quantum computers.
It aims to support the design of successful experiments on quantum hardware, and to facilitate advances in quantum algorithm development.
arXiv Detail & Related papers (2022-06-24T17:44:00Z) - YMIR: A Rapid Data-centric Development Platform for Vision Applications [82.67319997259622]
This paper introduces an open source platform for rapid development of computer vision applications.
The platform puts the efficient data development at the center of the machine learning development process.
arXiv Detail & Related papers (2021-11-19T05:02:55Z) - Knowledge Integration of Collaborative Product Design Using Cloud
Computing Infrastructure [65.2157099438235]
The main focus of this paper is the concept of ongoing research in providing the knowledge integration service for collaborative product design and development using cloud computing infrastructure.
Proposed knowledge integration services support users by giving real-time access to knowledge resources.
arXiv Detail & Related papers (2020-01-16T18:44:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.