Fugu-MT 論文翻訳(概要): PIRF: Physics-Informed Reward Fine-Tuning for Diffusion Models

論文の概要: PIRF: Physics-Informed Reward Fine-Tuning for Diffusion Models

arxiv url: http://arxiv.org/abs/2509.20570v1
Date: Wed, 24 Sep 2025 21:23:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-26 20:58:12.59339
Title: PIRF: Physics-Informed Reward Fine-Tuning for Diffusion Models
Title（参考訳）: PIRF:拡散モデルのための物理インフォームド・リワード微調整
Authors: Mingze Yuan, Pengfei Jin, Na Li, Quanzheng Li,
Abstract要約: 物理インフォームド生成をスパース報酬最適化問題とし、物理制約の順守を報酬信号として扱う。我々は、軌道レベルの報酬を計算し、それらの勾配を直接バックプロパゲートすることで、値近似を回避できる物理インフォームド・リワード微調整(PIRF)を導入する。 PIRFは、効率的なサンプリング体制下での優れた物理強制を一貫して達成し、科学的生成モデリングの進歩に対する報酬の微調整の可能性を強調している。
参考スコア（独自算出の注目度）: 11.791955441600825
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models have demonstrated strong generative capabilities across scientific domains, but often produce outputs that violate physical laws. We propose a new perspective by framing physics-informed generation as a sparse reward optimization problem, where adherence to physical constraints is treated as a reward signal. This formulation unifies prior approaches under a reward-based paradigm and reveals a shared bottleneck: reliance on diffusion posterior sampling (DPS)-style value function approximations, which introduce non-negligible errors and lead to training instability and inference inefficiency. To overcome this, we introduce Physics-Informed Reward Fine-tuning (PIRF), a method that bypasses value approximation by computing trajectory-level rewards and backpropagating their gradients directly. However, a naive implementation suffers from low sample efficiency and compromised data fidelity. PIRF mitigates these issues through two key strategies: (1) a layer-wise truncated backpropagation method that leverages the spatiotemporally localized nature of physics-based rewards, and (2) a weight-based regularization scheme that improves efficiency over traditional distillation-based methods. Across five PDE benchmarks, PIRF consistently achieves superior physical enforcement under efficient sampling regimes, highlighting the potential of reward fine-tuning for advancing scientific generative modeling.
Abstract（参考訳）: 拡散モデルは科学的領域全体で強力な生成能力を示してきたが、しばしば物理法則に違反した出力を生成する。本稿では,物理インフォームド生成をスパース報酬最適化問題としてフレーミングすることで,物理制約の順守を報酬信号として扱う新しい視点を提案する。この定式化は、報酬に基づくパラダイムの下で事前アプローチを統一し、拡散後サンプリング(DPS)スタイルの値関数近似(英語版)への依存(英語版)という共有ボトルネックを明らかにする。これを解決するために、軌道レベルの報酬を計算し、それらの勾配を直接バックプロパゲートすることで、値近似をバイパスするPhysical-Informed Reward Fine-tuning (PIRF)を導入する。しかし、単純な実装では、サンプル効率が低く、データの忠実度が損なわれている。 PIRFは,(1)物理に基づく報酬の時空間的局所性を活用する層ワイド・トランカットバックプロパゲーション法,(2)伝統的な蒸留法よりも効率を向上する重量ベース正規化法,の2つの主要な戦略により,これらの問題を緩和する。 5つのPDEベンチマークにおいて、PIRFは効率的なサンプリング体制下での優れた物理強制を一貫して達成し、科学的生成モデリングの進歩に対する報酬の微調整の可能性を強調している。

論文の概要: PIRF: Physics-Informed Reward Fine-Tuning for Diffusion Models

関連論文リスト