Online 3D Bin Packing with Constrained Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2006.14978v5
- Date: Thu, 13 Jan 2022 13:18:26 GMT
- Title: Online 3D Bin Packing with Constrained Deep Reinforcement Learning
- Authors: Hang Zhao, Qijin She, Chenyang Zhu, Yin Yang, Kai Xu
- Abstract summary: We solve a challenging yet practically useful variant of 3D Bin Packing Problem (3D-BPP)
In our problem, the agent has limited information about the items to be packed into the bin, and an item must be packed immediately after its arrival without buffering or readjusting.
We propose an effective and easy-to-implement constrained deep reinforcement learning (DRL) method under the actor-critic framework.
- Score: 27.656959508214193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We solve a challenging yet practically useful variant of 3D Bin Packing
Problem (3D-BPP). In our problem, the agent has limited information about the
items to be packed into the bin, and an item must be packed immediately after
its arrival without buffering or readjusting. The item's placement also
subjects to the constraints of collision avoidance and physical stability. We
formulate this online 3D-BPP as a constrained Markov decision process. To solve
the problem, we propose an effective and easy-to-implement constrained deep
reinforcement learning (DRL) method under the actor-critic framework. In
particular, we introduce a feasibility predictor to predict the feasibility
mask for the placement actions and use it to modulate the action probabilities
output by the actor during training. Such supervisions and transformations to
DRL facilitate the agent to learn feasible policies efficiently. Our method can
also be generalized e.g., with the ability to handle lookahead or items with
different orientations. We have conducted extensive evaluation showing that the
learned policy significantly outperforms the state-of-the-art methods. A user
study suggests that our method attains a human-level performance.
Related papers
- Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space [3.639580365066386]
We propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training.
The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance.
The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments.
arXiv Detail & Related papers (2024-05-20T12:31:11Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Adjustable Robust Reinforcement Learning for Online 3D Bin Packing [11.157035538606968]
Current deep reinforcement learning methods for online 3D-BPP fail in real-world settings where some worst-case scenarios can materialize.
We propose an adjustable robust reinforcement learning framework that allows efficient adjustment of robustness weights.
Experiments demonstrate that AR2L is versatile in the sense that it improves policy robustness while maintaining an acceptable level of performance for the nominal case.
arXiv Detail & Related papers (2023-10-06T15:34:21Z) - Learning Physically Realizable Skills for Online Packing of General 3D
Shapes [41.27652080050046]
We study the problem of learning online packing skills for irregular 3D shapes.
The goal is to consecutively move a sequence of 3D objects with arbitrary shapes into a designated container.
We take physical realizability into account, involving physics dynamics and constraints of a placement.
arXiv Detail & Related papers (2022-12-05T08:23:39Z) - End-to-End Affordance Learning for Robotic Manipulation [4.405918052597016]
Learning to manipulate 3D objects in an interactive environment has been a challenging problem in Reinforcement Learning.
Visual affordance has shown great prospects in providing object-centric information priors with effective actionable semantics.
In this study, we take advantage of visual affordance by using the contact information generated during the RL training process to predict contact maps of interest.
arXiv Detail & Related papers (2022-09-26T18:24:28Z) - Learning Practically Feasible Policies for Online 3D Bin Packing [36.33774915391967]
We tackle the Online 3D Bin Packing Problem, a challenging yet practically useful variant of the classical Bin Packing Problem.
Online 3D-BPP can be naturally formulated as Markov Decision Process (MDP)
We adopt deep reinforcement learning, in particular, the on-policy actor-critic framework, to solve this MDP with constrained action space.
arXiv Detail & Related papers (2021-08-31T08:37:58Z) - Boosting Weakly Supervised Object Detection via Learning Bounding Box
Adjusters [76.36104006511684]
Weakly-supervised object detection (WSOD) has emerged as an inspiring recent topic to avoid expensive instance-level object annotations.
We defend the problem setting for improving localization performance by leveraging the bounding box regression knowledge from a well-annotated auxiliary dataset.
Our method performs favorably against state-of-the-art WSOD methods and knowledge transfer model with similar problem setting.
arXiv Detail & Related papers (2021-08-03T13:38:20Z) - Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses.
We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - POMP: Pomcp-based Online Motion Planning for active visual search in
indoor environments [89.43830036483901]
We focus on the problem of learning an optimal policy for Active Visual Search (AVS) of objects in known indoor environments with an online setup.
Our POMP method uses as input the current pose of an agent and a RGB-D frame.
We validate our method on the publicly available AVD benchmark, achieving an average success rate of 0.76 with an average path length of 17.1.
arXiv Detail & Related papers (2020-09-17T08:23:50Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.