Related papers: Integrating Deep Learning in Domain Sciences at Exascale

Integrating Deep Learning in Domain Sciences at Exascale

URL: http://arxiv.org/abs/2011.11188v1
Date: Mon, 23 Nov 2020 03:09:58 GMT
Title: Integrating Deep Learning in Domain Sciences at Exascale
Authors: Rick Archibald, Edmond Chow, Eduardo D'Azevedo, Jack Dongarra, Markus Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols, Stanimire Tomov, Kwai Wong, and Junqi Yin
Abstract summary: We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently. We propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems. We present illustrations and potential solutions for enhancing traditional compute- and data-intensive applications with AI.
Score: 2.241545093375334
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents some of the current challenges in designing deep learning artificial intelligence (AI) and integrating it with traditional high-performance computing (HPC) simulations. We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently, identify challenges, and propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems and upcoming exascale systems. These developments, along with existing HPC AI software capabilities, have been integrated into MagmaDNN, an open-source HPC deep learning framework. Many deep learning frameworks are targeted at data scientists and fall short in providing quality integration into existing HPC workflows. This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e.g., as in MagmaDNN) through a deep integration with existing HPC libraries, such as MAGMA and its modular memory management, MPI, CuBLAS, CuDNN, MKL, and HIP. Advancements are also illustrated through the use of algorithmic enhancements in reduced- and mixed-precision, as well as asynchronous optimization methods. Finally, we present illustrations and potential solutions for enhancing traditional compute- and data-intensive applications at ORNL and UTK with AI. The approaches and future challenges are illustrated in materials science, imaging, and climate applications.

Related papers

A Survey on Inference Optimization Techniques for Mixture of Experts Models [50.40325411764262]
Large-scale Mixture of Experts (MoE) models offer enhanced model capacity and computational efficiency through conditional computation. deploying and running inference on these models presents significant challenges in computational resources, latency, and energy efficiency. This survey analyzes optimization techniques for MoE models across the entire system stack.
arXiv Detail & Related papers (2024-12-18T14:11:15Z)
Transforming the Hybrid Cloud for Emerging AI Workloads [81.15269563290326]
This white paper envisions transforming hybrid cloud systems to meet the growing complexity of AI workloads. The proposed framework addresses critical challenges in energy efficiency, performance, and cost-effectiveness. This joint initiative aims to establish hybrid clouds as secure, efficient, and sustainable platforms.
arXiv Detail & Related papers (2024-11-20T11:57:43Z)
Integrating Quantum Computing Resources into Scientific HPC Ecosystems [29.1407119677928]
Quantum Computing offers significant potential to enhance scientific discovery in fields such as quantum chemistry, optimization, and artificial intelligence. QC faces challenges due to the noisy intermediate-scale quantum era's inherent external noise issues. This paper outlines plans to unlock new computational possibilities, driving forward scientific inquiry and innovation in a wide array of research domains.
arXiv Detail & Related papers (2024-08-28T22:44:54Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems. We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z)
CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection [0.07499722271664144]
We propose a novel framework for automation of High Level Context (HLC) reasoning across intelligent systems at scale. The design of the framework supports the sharing and inter context among intelligent systems and the components for handling CSMs and the management of hierarchy, relationship, and transition. The implementation of the framework experiments on the HLC reasoning into vector and matrix computing and presents the potential to reach next level of automation.
arXiv Detail & Related papers (2023-08-21T22:21:15Z)
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z)
SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems [18.699431277588637]
We propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems. As the data and model constantly scale, we can investigate the trend and range of AI performance on HPC systems.
arXiv Detail & Related papers (2022-12-07T02:42:29Z)
Towards a Dynamic Composability Approach for using Heterogeneous Systems in Remote Sensing [0.0]
We present a novel approach for using composable systems in the intersection between scientific computing, artificial intelligence (AI), and remote sensing domain. We describe the architecture of a first working example of a composable infrastructure that federates Expanse, an NSF-funded supercomputer, with Nautilus, a geo-distributed cluster.
arXiv Detail & Related papers (2022-11-13T14:48:00Z)
AI-coupled HPC Workflows [1.5469452301122175]
Introduction to AI/ML models into the traditional HPC has been an enabler of highly accurate modeling. Various modes of integrating AI/ML models to HPC computations, resulting in diverse types of AI-coupled HPC.
arXiv Detail & Related papers (2022-08-24T19:16:43Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)
Deep Multi-Task Learning for Cooperative NOMA: System Design and Principles [52.79089414630366]
We develop a novel deep cooperative NOMA scheme, drawing upon the recent advances in deep learning (DL) We develop a novel hybrid-cascaded deep neural network (DNN) architecture such that the entire system can be optimized in a holistic manner.
arXiv Detail & Related papers (2020-07-27T12:38:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.