Fugu-MT 論文翻訳(概要): When does learning pay off? A study on DRL-based dynamic algorithm configuration for carbon-aware scheduling

論文の概要: When does learning pay off? A study on DRL-based dynamic algorithm configuration for carbon-aware scheduling

arxiv url: http://arxiv.org/abs/2604.01886v1
Date: Thu, 02 Apr 2026 10:47:01 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-03 14:21:10.689809
Title: When does learning pay off? A study on DRL-based dynamic algorithm configuration for carbon-aware scheduling
Title（参考訳）: 学習はいつ報われるのか? : DRLに基づく炭素認識スケジューリングのための動的アルゴリズム構成に関する研究
Authors: Andrea Mencaroni, Robbert Reijnen, Yingqian Zhang, Dieter Claeys,
Abstract要約: DRLベースのDACフレームワークを開発し、小さな単純なインスタンスでのみトレーニングする。我々は,その性能を静的なチューニングベースラインと比較する。その結果、DRLは、トレーニングインスタンスの分布を超えた、堅牢で一般化可能な制御ポリシーを得られることが確認された。
参考スコア（独自算出の注目度）: 1.0799600071196367
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Deep reinforcement learning (DRL) has recently emerged as a promising tool for Dynamic Algorithm Configuration (DAC), enabling evolutionary algorithms to adapt their parameters online rather than relying on static tuned configurations. While DRL can learn effective control policies, training is computationally expensive. This cost may be justified if learned policies generalize, allowing the training effort to transfer across instance types and problem scales. Yet, for real-world optimization problems, it remains unclear whether this promise holds in practice and under which conditions the investment in learning pays off. In this work, we investigate this question in the context of the carbon-aware permutation flow-shop scheduling problem. We develop a DRL-based DAC framework and train it exclusively on small, simple instances. We then deploy the learned policy on both similar and more complex unseen instances and compare its performance against a static tuned baseline, which provides a fair point of comparison. Our findings show that the proposed method provides a strong dynamic algorithm control policy that can be effectively transferred to different unseen problem instances. Notably, on simple and cheap to compute instances, similar to those observed during training and tuning, DRL performs comparably with the statically tuned baseline. However, as instance characteristics diverge and computational complexities increase, the DRL-learned policy continuously outperforms static tuning. These results confirm that DRL can acquire robust and generalizable control policies which are effective beyond the training instance distributions. This ability to generalize across instance types makes the initial computational investment worthwhile, particularly in settings where static tuning struggles to adapt to changing problem scenarios.
Abstract（参考訳）: ディープラーニング(DRL)は最近、動的アルゴリズム構成(DAC)のための有望なツールとして登場し、進化的アルゴリズムが静的なチューニング設定に頼るのではなく、パラメータをオンラインで適応できるようにする。 DRLは効果的な制御ポリシーを学ぶことができるが、訓練は計算的に高価である。学習したポリシーが一般化すれば、このコストは正当化され、トレーニングの作業がインスタンスタイプや問題スケール間で転送できるようになる。しかし、現実の最適化問題では、この約束が実際に成り立つのか、どのような条件下で学習への投資が報われるのかは定かではない。本研究では,炭素を意識した置換フローショップスケジューリング問題に関して,この問題を考察する。 DRLベースのDACフレームワークを開発し、小さな単純なインスタンスでのみトレーニングする。次に、学習したポリシーを、類似した、あるいはより複雑なインスタンスにデプロイし、そのパフォーマンスを静的なチューニングベースラインと比較する。提案手法は,異なる未確認問題インスタンスに効果的に転送可能な,強力な動的アルゴリズム制御ポリシを提供する。特に、トレーニングやチューニング中に観測されたものと同様、シンプルで安価で計算可能なインスタンスでは、DRLは静的にチューニングされたベースラインと互換性がある。しかし、インスタンスの特性が多様化し、計算複雑性が増大するにつれて、DRLが引き起こしたポリシーは静的チューニングを継続的に上回る。これらの結果から,DRLはトレーニングインスタンス分布を超えて有効であるロバストで一般化可能な制御ポリシーを得られることが確認された。このインスタンスタイプをまたいで一般化する能力は、特に、変化する問題シナリオに適応するために静的チューニングが苦労している環境で、初期の計算投資に価値がある。

論文の概要: When does learning pay off? A study on DRL-based dynamic algorithm configuration for carbon-aware scheduling

関連論文リスト