Fugu-MT 論文翻訳(概要): Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models

論文の概要: Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models

arxiv url: http://arxiv.org/abs/2301.11559v1
Date: Fri, 27 Jan 2023 06:48:37 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-30 16:12:15.350260
Title: Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models
Title（参考訳）: 不均一量子古典計画モデルにおけるマルチスレッドの実現
Authors: Akihiro Hayashi, Austin Adams, Jeffrey Young, Alexander McCaskey, Eugene Dumitrescu, Vivek Sarkar, Thomas M. Conte
Abstract要約: 量子カーネルの並列実行を可能にするために,C++ベースの並列コンストラクトを導入する。予備的な性能の結果は、カーネル毎に12スレッドのベルカーネルを2回実行し、カーネルを次々に実行する並列性能が向上したことを示している。
参考スコア（独自算出の注目度）: 53.937052213390736
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we address some of the key limitations to realizing a generic heterogeneous parallel programming model for quantum-classical heterogeneous platforms. We discuss our experience in enabling user-level multi-threading in QCOR as well as challenges that need to be addressed for programming future quantum-classical systems. Specifically, we discuss our design and implementation of introducing C++-based parallel constructs to enable 1) parallel execution of a quantum kernel with std::thread and 2) asynchronous execution with std::async. To do so, we provide a detailed overview of the current implementation of the QCOR programming model and runtime, and discuss how we add 1) thread-safety to some of its user-facing API routines, and 2) increase parallelism in QCOR by removing data races that inhibit multi-threading so as to better utilize available computing resources. We also present preliminary performance results with the Quantum++ back end on a single-node Ryzen9 3900X machine that has 12 physical cores (24 hardware threads) with 128GB of RAM. The results show that running two Bell kernels with 12 threads per kernel in parallel outperforms running the kernels one after the other each with 24 threads (1.63x improvement). In addition, we observe the same trend when running two Shor's algorthm kernels in parallel (1.22x faster than executing the kernels one after the other). It is worth noting that the trends remain the same even when we only use physical cores instead of threads. We believe that our design, implementation, and results will open up an opportunity not only for 1) enabling quicker prototyping of parallel/asynchrony-aware quantum-classical algorithms on quantum circuit simulators in the short-term, but also for 2) realizing a generic heterogeneous parallel programming model for quantum-classical heterogeneous platforms in the long-term.
Abstract（参考訳）: 本稿では,量子古典的ヘテロジニアスプラットフォームのための汎用的ヘテロジニアス並列プログラミングモデルを実現するための鍵となる制約について述べる。我々は、qcorでユーザレベルのマルチスレッドを可能にすることの経験と、将来の量子古典システムプログラミングのために対処すべき課題について論じる。具体的には、C++ベースの並列構造を導入して実現するための設計と実装について論じる。 1) std::thread を用いた量子カーネルの並列実行 2) std::asyncによる非同期実行。そのために、QCORプログラミングモデルとランタイムの現在の実装の概要を説明し、どのように追加するかを議論する。 1)一部のユーザ対応apiルーチンに対するスレッドセーフ性,および 2)QCORの並列性を高めるために,マルチスレッドを阻害するデータ競合を除去し,利用可能な計算資源をより活用する。また、128GBのRAMを持つ12の物理コア(24のハードウェアスレッド)を持つシングルノードのRyzen9 3900Xマシン上でQuantum++バックエンドの予備的なパフォーマンス結果を示す。その結果、ベルカーネルを2つ実行し、カーネルごとに12スレッドずつ並列に実行し、24スレッド(1.63倍の改善)でカーネルを1つずつ実行した。さらに、2つのshorのalgorthmカーネルを並列に実行する(カーネルの実行を次々に実行するよりも1.22倍速い)場合も同様の傾向を観測する。スレッドの代わりに物理コアのみを使用する場合でも、トレンドは変わらない点に注意が必要だ。私たちは、設計、実装、そして結果が、単に目的だけでなく機会を開くと信じています。 1) 量子回路シミュレータ上での並列・非同期性を考慮した量子古典アルゴリズムの高速プロトタイピングの実現 2) 量子古典的ヘテロジニアスプラットフォームのための汎用的ヘテロジニアス・並列プログラミングモデルの実現。

論文の概要: Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models

関連論文リスト