Fugu-MT 論文翻訳(概要): Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

論文の概要: Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

arxiv url: http://arxiv.org/abs/2511.15977v1
Date: Thu, 20 Nov 2025 02:14:56 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-21 17:08:52.427493
Title: Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows
Title（参考訳）: 精密医療ゲノムワークフローのための効率的な染色体並列化
Authors: Daniel Mas Montserrat, Ray Verma, Míriam Barrabés, Francisco M. de la Vega, Carlos D. Bustamante, Alexander G. Ioannidis,
Abstract要約: 精密医療に用いられる大規模なゲノムデータセットは、サンプル毎に数十ギガバイトに及ぶデータセットを処理することができる。単純な静的リソース割り当てメソッドは、染色体単位のRAM要求における可変性を扱うのに苦労する。染色体レベルのバイオインフォマティクスの適応的でRAM効率の良い並列化のための複数のメカニズムを提案する。
参考スコア（独自算出の注目度）: 39.445312819357206
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosome RAM demands, resulting in poor resource utilization and long runtimes. In this work, we propose multiple mechanisms for adaptive, RAM-efficient parallelization of chromosome-level bioinformatics workflows. First, we develop a symbolic regression model that estimates per-chromosome memory consumption for a given task and introduces an interpolating bias to conservatively minimize over-allocation. Second, we present a dynamic scheduler that adaptively predicts RAM usage with a polynomial regression model, treating task packing as a Knapsack problem to optimally batch jobs based on predicted memory requirements. Additionally, we present a static scheduler that optimizes chromosome processing order to minimize peak memory while preserving throughput. Our proposed methods, evaluated on simulations and real-world genomic pipelines, provide new mechanisms to reduce memory overruns and balance load across threads. We thereby achieve faster end-to-end execution, showcasing the potential to optimize large-scale genomic workflows.
Abstract（参考訳）: 精密医療で使用される大規模なゲノムワークフローは、サンプルあたり数十から数百ギガバイトのデータセットを処理することができ、高いメモリスパイク、集中ディスクI/O、メモリ外エラーによるタスク障害につながる。単純な静的リソース割り当てメソッドは、クロック単位のRAM要求の変動に対処するのに苦労し、リソース利用の低さと長いランタイムをもたらす。本研究では,染色体レベルのバイオインフォマティクスワークフローの適応的,RAM効率の並列化のための複数のメカニズムを提案する。まず,各タスクにおける染色体単位のメモリ消費を推定し,過割当を最小化する補間バイアスを導入するシンボリック回帰モデルを提案する。第2に,メモリ要求の予測に基づき,メモリ使用量を適応的に予測し,タスクパッキングをKnapsack問題として扱い,ジョブをバッチ化する動的スケジューラを提案する。さらに、スループットを保ちながらピークメモリを最小限に抑えるため、染色体処理順序を最適化する静的スケジューラを提案する。提案手法はシミュレーションと実世界のゲノムパイプラインを用いて評価され,メモリオーバーランを低減し,スレッド間の負荷のバランスをとるための新しいメカニズムを提供する。これにより、より高速なエンドツーエンド実行を実現し、大規模なゲノムワークフローを最適化する可能性を示す。

論文の概要: Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

関連論文リスト