Final Report for CHESS: Cloud, High-Performance Computing, and Edge for Science and Security
- URL: http://arxiv.org/abs/2410.16093v1
- Date: Mon, 21 Oct 2024 15:16:00 GMT
- Title: Final Report for CHESS: Cloud, High-Performance Computing, and Edge for Science and Security
- Authors: Nathan Tallent, Jan Strube, Luanzheng Guo, Hyungro Lee, Jesun Firoz, Sayan Ghosh, Bo Fang, Oceane Bel, Steven Spurgeon, Sarah Akers, Christina Doty, Erol Cromwell,
- Abstract summary: Methods for constructing continuum platforms, orchestrating workflow tasks, and curating datasets fail to achieve scientific requirements for performance, energy, security, and reliability.
Report describes the results and successes of CHESS from the perspective of open science.
- Score: 5.781151161558928
- License:
- Abstract: Automating the theory-experiment cycle requires effective distributed workflows that utilize a computing continuum spanning lab instruments, edge sensors, computing resources at multiple facilities, data sets distributed across multiple information sources, and potentially cloud. Unfortunately, the obvious methods for constructing continuum platforms, orchestrating workflow tasks, and curating datasets over time fail to achieve scientific requirements for performance, energy, security, and reliability. Furthermore, achieving the best use of continuum resources depends upon the efficient composition and execution of workflow tasks, i.e., combinations of numerical solvers, data analytics, and machine learning. Pacific Northwest National Laboratory's LDRD "Cloud, High-Performance Computing (HPC), and Edge for Science and Security" (CHESS) has developed a set of interrelated capabilities for enabling distributed scientific workflows and curating datasets. This report describes the results and successes of CHESS from the perspective of open science.
Related papers
- Towards an Integrated Performance Framework for Fire Science and Management Workflows [0.0]
This paper presents an artificial intelligence and machine learning (AI/ML) approach to performance assessment and optimization.
An associated early AI/ML framework spanning performance data collection, prediction and optimization is applied to wildfire science applications.
arXiv Detail & Related papers (2024-07-30T22:37:25Z) - Reinforcement Learning-driven Data-intensive Workflow Scheduling for Volunteer Edge-Cloud [2.417545540754702]
Volunteer Edge-Cloud (VEC) has gained traction as a cost-effective, community computing paradigm to support data-intensive scientific research.
However, due to the highly distributed and heterogeneous nature of VEC resources, centralized workflow task scheduling remains a challenge.
We propose a Reinforcement Learning (RL)-driven data-intensive scientific workflow scheduling approach.
arXiv Detail & Related papers (2024-07-01T16:21:13Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - Everywhere & Nowhere: Envisioning a Computing Continuum for Science [21.111766975909752]
Emerging data-driven scientific are seeking to leverage distributed data sources to understand end-to-end phenomena, drive experimentation, and facilitate important decision-making.
This paper explores a computing that is everywhere and nowhere -- one spanning resources at the edges, in the core, and in between, and providing abstractions that can be harnessed to support science.
It also introduces recent research in programming abstractions that can express what data should be processed and when and where it should be processed, and autonomic services that automate the discovery of resources and the orchestration of computations across these resources.
arXiv Detail & Related papers (2024-06-06T20:07:31Z) - CUDC: A Curiosity-Driven Unsupervised Data Collection Method with
Adaptive Temporal Distances for Offline Reinforcement Learning [62.58375643251612]
We propose a Curiosity-driven Unsupervised Data Collection (CUDC) method to expand feature space using adaptive temporal distances for task-agnostic data collection.
With this adaptive reachability mechanism in place, the feature representation can be diversified, and the agent can navigate itself to collect higher-quality data with curiosity.
Empirically, CUDC surpasses existing unsupervised methods in efficiency and learning performance in various downstream offline RL tasks of the DeepMind control suite.
arXiv Detail & Related papers (2023-12-19T14:26:23Z) - Towards Lightweight Data Integration using Multi-workflow Provenance and
Data Observability [0.2517763905487249]
Integrated data analysis plays a crucial role in scientific discovery, especially in the current AI era.
We propose MIDA: an approach for lightweight runtime Multi-workflow Integrated Data Analysis.
We show near-zero overhead running up to 100,000 tasks on 1,680 CPU cores on the Summit supercomputer.
arXiv Detail & Related papers (2023-08-17T14:20:29Z) - Multi-Fidelity Active Learning with GFlowNets [65.91555804996203]
We propose a multi-fidelity active learning algorithm with GFlowNets as a sampler, to efficiently discover diverse, high-scoring candidates.
Our evaluation on molecular discovery tasks shows that multi-fidelity active learning with GFlowNets can discover high-scoring candidates at a fraction of the budget of its single-fidelity counterpart.
arXiv Detail & Related papers (2023-06-20T17:43:42Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - The MIT Supercloud Workload Classification Challenge [10.458111248130944]
In this paper, we present a workload classification challenge based on the MIT Supercloud dataset.
The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads.
arXiv Detail & Related papers (2022-04-12T14:28:04Z) - From Distributed Machine Learning to Federated Learning: A Survey [49.7569746460225]
Federated learning emerges as an efficient approach to exploit distributed data and computing resources.
We propose a functional architecture of federated learning systems and a taxonomy of related techniques.
We present the distributed training, data communication, and security of FL systems.
arXiv Detail & Related papers (2021-04-29T14:15:11Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.