Related papers: ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation

ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation

URL: http://arxiv.org/abs/2307.02991v1
Date: Thu, 6 Jul 2023 13:44:29 GMT
Title: ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation
Authors: Abhijeet Pendyala, Justin Dettmer, Tobias Glasmachers, Asma Atamna
Abstract summary: ContainerGym is a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes challenges commonly encountered in real-world sequential decision making problems. It can be configured to instantiate problems of varying degrees of difficulty.
Score: 1.6058099298620425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our results and the statistical tools used to interpret them allow to highlight interesting limitations of well-known deep reinforcement learning algorithms, namely PPO, TRPO and DQN.

Related papers

Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function. We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z)
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF [82.73541793388]
We introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation. We provide theoretical studies of the problem landscape and its penalty-based gradient (policy) algorithms. We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg Markov game, RL from human feedback and incentive design.
arXiv Detail & Related papers (2024-02-10T04:54:15Z)
A General Framework for Learning from Weak Supervision [93.89870459388185]
This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources. We also present an advanced algorithm that significantly simplifies the EM computational demands.
arXiv Detail & Related papers (2024-02-02T21:48:50Z)
Benchmarking Constraint Inference in Inverse Reinforcement Learning [19.314352936252444]
In many real-world problems, the constraints followed by expert agents are often hard to specify mathematically and unknown to the RL agents. In this paper, we construct a CIRL benchmark in the context of two major application domains: robot control and autonomous driving. The benchmark, including the information for reproducing the performance of CIRL algorithms, is publicly available at https://github.com/Guiliang/CIRL-benchmarks-public.
arXiv Detail & Related papers (2022-06-20T09:22:20Z)
Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS) HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when. Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z)
CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning [45.52724876199729]
We present CARL, a collection of well-known RL environments extended to contextual RL problems. We provide first evidence that disentangling representation learning of the states from the policy learning with the context facilitates better generalization.
arXiv Detail & Related papers (2021-10-05T15:04:01Z)
No-Regret Reinforcement Learning with Heavy-Tailed Rewards [11.715649997214125]
We show that the difficulty of learning heavy-tailed rewards dominates the difficulty of learning transition probabilities. Our algorithms naturally generalize to deep reinforcement learning applications. All of our algorithms outperform baselines on both synthetic MDPs and standard RL benchmarks.
arXiv Detail & Related papers (2021-02-25T10:25:57Z)
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z)
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms. SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.