Fugu-MT 論文翻訳(概要): One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

論文の概要: One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

arxiv url: http://arxiv.org/abs/2510.10057v1
Date: Sat, 11 Oct 2025 06:47:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.756463
Title: One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem
Title（参考訳）: One4Many-StablePacker: 3D Binパッケージ問題に対する効率的な深層強化学習フレームワーク
Authors: Lei Gao, Shihong Huang, Shengjie Wang, Hong Ma, Feng Zhang, Hengda Bao, Qichang Chen, Weihua Zhou,
Abstract要約: 3次元ビンパッキング問題(3D-BPP)は物流や倉庫に広く応用されている。我々は,新しい強化学習フレームワークOne4Many-StablePacker(O4M-SP)を提案する。 O4M-SPは、実際には一般的なサポートと重みの制約を取り入れつつ、単一のトレーニングプロセスで様々なビン次元を処理できる。
参考スコア（独自算出の注目度）: 12.516955835907089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The three-dimensional bin packing problem (3D-BPP) is widely applied in logistics and warehousing. Existing learning-based approaches often neglect practical stability-related constraints and exhibit limitations in generalizing across diverse bin dimensions. To address these limitations, we propose a novel deep reinforcement learning framework, One4Many-StablePacker (O4M-SP). The primary advantage of O4M-SP is its ability to handle various bin dimensions in a single training process while incorporating support and weight constraints common in practice. Our training method introduces two innovative mechanisms. First, it employs a weighted reward function that integrates loading rate and a new height difference metric for packing layouts, promoting improved bin utilization through flatter packing configurations. Second, it combines clipped policy gradient optimization with a tailored policy drifting method to mitigate policy entropy collapse, encouraging exploration at critical decision nodes during packing to avoid suboptimal solutions. Extensive experiments demonstrate that O4M-SP generalizes successfully across diverse bin dimensions and significantly outperforms baseline methods. Furthermore, O4M-SP exhibits strong practical applicability by effectively addressing packing scenarios with stability constraints.
Abstract（参考訳）: 3次元ビンパッキング問題(3D-BPP)は物流や倉庫に広く応用されている。既存の学習ベースのアプローチは、しばしば実践的な安定性に関する制約を無視し、多様なビン次元をまたいだ一般化の限界を示す。これらの制約に対処するため、我々は新しい強化学習フレームワークOne4Many-StablePacker (O4M-SP)を提案する。 O4M-SPの最大の利点は、1つのトレーニングプロセスで様々なビン次元を扱えることだ。トレーニング方法は2つの革新的なメカニズムを導入します。第一に、負荷率と新しい高さ差指標を統合した重み付き報酬関数を用いてレイアウトをパッケージ化し、フラットなパッキング構成によるビン利用の改善を促進する。第二に、カットされたポリシー勾配最適化と、ポリシーのエントロピー崩壊を軽減するための調整されたポリシードリフト手法を組み合わせることで、最適解を避けるためにパッキング中の重要な決定ノードの探索を奨励する。大規模な実験により、O4M-SPは多様なビン次元にまたがってうまく一般化し、ベースライン法を著しく上回ることを示した。さらに、O4M-SPは、安定制約付きパッケージングシナリオを効果的に扱うことで、強力な実用性を示す。

論文の概要: One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

関連論文リスト