LLM as HPC Expert: Extending RAG Architecture for HPC Data
- URL: http://arxiv.org/abs/2501.14733v1
- Date: Mon, 09 Dec 2024 02:55:30 GMT
- Title: LLM as HPC Expert: Extending RAG Architecture for HPC Data
- Authors: Yusuke Miyashita, Patrick Kin Man Tung, Johan Barthélemy,
- Abstract summary: This paper introduces Hypothetical Command Embeddings (HyCE), a novel method that extends Retrieval-Augmented Generation (RAG)<n>HyCE enriches large language models (LLM) with real-time, user-specific HPC information, addressing the limitations of fine-tuned models on such data.<n>We tackle essential security concerns, including data privacy and command execution risks, associated with deploying LLMs in HPC environments.
- Score: 0.058520770038704165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-Performance Computing (HPC) is crucial for performing advanced computational tasks, yet their complexity often challenges users, particularly those unfamiliar with HPC-specific commands and workflows. This paper introduces Hypothetical Command Embeddings (HyCE), a novel method that extends Retrieval-Augmented Generation (RAG) by integrating real-time, user-specific HPC data, enhancing accessibility to these systems. HyCE enriches large language models (LLM) with real-time, user-specific HPC information, addressing the limitations of fine-tuned models on such data. We evaluate HyCE using an automated RAG evaluation framework, where the LLM itself creates synthetic questions from the HPC data and serves as a judge, assessing the efficacy of the extended RAG with the evaluation metrics relevant for HPC tasks. Additionally, we tackle essential security concerns, including data privacy and command execution risks, associated with deploying LLMs in HPC environments. This solution provides a scalable and adaptable approach for HPC clusters to leverage LLMs as HPC expert, bridging the gap between users and the complex systems of HPC.
Related papers
- Do Large Language Models Understand Performance Optimization? [0.9320657506524149]
Large Language Models (LLMs) have emerged as powerful tools for software development tasks such as code completion, translation, and optimization.
This paper presents a benchmark suite encompassing multiple critical HPC computational motifs to evaluate the performance of code optimized by state-of-the-art LLMs.
arXiv Detail & Related papers (2025-03-17T23:30:23Z) - Scalable Runtime Architecture for Data-driven, Hybrid HPC and ML Workflow Applications [2.0999841017238063]
Hybrid combining traditional HPC and novel ML methodologies are transforming scientific computing.
This paper presents the architecture and implementation of a scalable runtime system that extends RADICAL-Pilot with service-based execution to support AI-out- HPC.
Preliminary experimental results show that our approach manages concurrent execution of ML models across local and remote HPC/cloud resources with minimal architectural overheads.
arXiv Detail & Related papers (2025-03-17T16:21:48Z) - ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning [50.53705050673944]
We propose ULTHO, an ultra-lightweight yet powerful framework for fast HPO in deep RL within single runs.<n>Specifically, we formulate the HPO process as a multi-armed bandit with clustered arms (MABC) and link it directly to long-term return optimization.<n>We test ULTHO on benchmarks including ALE, Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-03-08T07:03:43Z) - Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis [14.458529723566379]
Large language models (LLMs) can be employed for programming languages such as Python and C++.<n>This paper explores leveraging LLMs to generate High-Level Synthesis (HLS)-based hardware design.
arXiv Detail & Related papers (2025-02-19T17:53:59Z) - Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning [49.997801914237094]
We introduce the Fire-Flyer AI- HPC architecture, a synergistic hardware-software co-design framework and its best practices.
For Deep Learning (DL) training, we deployed the Fire-Flyer 2 with 10,000 PCIe A100 GPUs, achieved performance approximating the DGX-A100 while reducing costs by half and energy consumption by 40%.
Through our software stack, including HaiScale, 3FS, and HAI-Platform, we achieved substantial scalability by overlapping computation and communication.
arXiv Detail & Related papers (2024-08-26T10:11:56Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement
Learning Approach [58.911515417156174]
We propose a new definition of Age of Information (AoI) and, based on the redefined AoI, we formulate an online AoI problem for MEC systems.
We introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics.
We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness.
arXiv Detail & Related papers (2023-12-01T01:30:49Z) - HPC-GPT: Integrating Large Language Model for High-Performance Computing [3.8078849170829407]
We propose HPC-GPT, a novel LLaMA-based model that has been supervised fine-tuning using generated QA (Question-Answer) instances for the HPC domain.
To evaluate its effectiveness, we concentrate on two HPC tasks: managing AI models and datasets for HPC, and data race detection.
Our experiments on open-source benchmarks yield extensive results, underscoring HPC-GPT's potential to bridge the performance gap between LLMs and HPC-specific tasks.
arXiv Detail & Related papers (2023-10-03T01:34:55Z) - A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization [57.450449884166346]
We propose an adaptive HPO method to account for the privacy cost of HPO.
We obtain state-of-the-art performance on 22 benchmark tasks, across computer vision and natural language processing, across pretraining and finetuning.
arXiv Detail & Related papers (2022-12-08T18:56:37Z) - DRAS-CQSim: A Reinforcement Learning based Framework for HPC Cluster
Scheduling [0.9529163786034884]
We present a reinforcement learning based HPC scheduling framework named DRAS-CQSim to automatically learn optimal scheduling policy.
DRAS-CQSim encapsulates simulation environments, agents, hyper parameter tuning options, and different reinforcement learning algorithms, which allows the system administrators to quickly obtain customized scheduling policies.
arXiv Detail & Related papers (2021-05-16T21:56:31Z) - Secure Platform for Processing Sensitive Data on Shared HPC Systems [0.0]
High performance computing clusters pose challenges for processing sensitive data.
In this work we present a novel method for creating secure computing environments on traditional multi-tenant high-performance computing clusters.
arXiv Detail & Related papers (2021-03-26T18:30:33Z) - Integrating Deep Learning in Domain Sciences at Exascale [2.241545093375334]
We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently.
We propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems.
We present illustrations and potential solutions for enhancing traditional compute- and data-intensive applications with AI.
arXiv Detail & Related papers (2020-11-23T03:09:58Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.