ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource
Allocation
- URL: http://arxiv.org/abs/2307.02991v1
- Date: Thu, 6 Jul 2023 13:44:29 GMT
- Title: ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource
Allocation
- Authors: Abhijeet Pendyala, Justin Dettmer, Tobias Glasmachers, Asma Atamna
- Abstract summary: ContainerGym is a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task.
The proposed benchmark encodes challenges commonly encountered in real-world sequential decision making problems.
It can be configured to instantiate problems of varying degrees of difficulty.
- Score: 1.6058099298620425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present ContainerGym, a benchmark for reinforcement learning inspired by a
real-world industrial resource allocation task. The proposed benchmark encodes
a range of challenges commonly encountered in real-world sequential decision
making problems, such as uncertainty. It can be configured to instantiate
problems of varying degrees of difficulty, e.g., in terms of variable
dimensionality. Our benchmark differs from other reinforcement learning
benchmarks, including the ones aiming to encode real-world difficulties, in
that it is directly derived from a real-world industrial problem, which
underwent minimal simplification and streamlining. It is sufficiently versatile
to evaluate reinforcement learning algorithms on any real-world problem that
fits our resource allocation framework. We provide results of standard baseline
methods. Going beyond the usual training reward curves, our results and the
statistical tools used to interpret them allow to highlight interesting
limitations of well-known deep reinforcement learning algorithms, namely PPO,
TRPO and DQN.
Related papers
- Can Learned Optimization Make Reinforcement Learning Less Difficult? [70.5036361852812]
We consider whether learned optimization can help overcome reinforcement learning difficulties.
Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed to these difficulties.
arXiv Detail & Related papers (2024-07-09T17:55:23Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF [82.73541793388]
We introduce the first principled algorithmic framework for solving bilevel RL problems through the lens of penalty formulation.
We provide theoretical studies of the problem landscape and its penalty-based gradient (policy) algorithms.
We demonstrate the effectiveness of our algorithms via simulations in the Stackelberg Markov game, RL from human feedback and incentive design.
arXiv Detail & Related papers (2024-02-10T04:54:15Z) - A General Framework for Learning from Weak Supervision [93.89870459388185]
This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm.
Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources.
We also present an advanced algorithm that significantly simplifies the EM computational demands.
arXiv Detail & Related papers (2024-02-02T21:48:50Z) - Benchmarking Constraint Inference in Inverse Reinforcement Learning [19.314352936252444]
In many real-world problems, the constraints followed by expert agents are often hard to specify mathematically and unknown to the RL agents.
In this paper, we construct a CIRL benchmark in the context of two major application domains: robot control and autonomous driving.
The benchmark, including the information for reproducing the performance of CIRL algorithms, is publicly available at https://github.com/Guiliang/CIRL-benchmarks-public.
arXiv Detail & Related papers (2022-06-20T09:22:20Z) - Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS)
HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when.
Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z) - CARL: A Benchmark for Contextual and Adaptive Reinforcement Learning [45.52724876199729]
We present CARL, a collection of well-known RL environments extended to contextual RL problems.
We provide first evidence that disentangling representation learning of the states from the policy learning with the context facilitates better generalization.
arXiv Detail & Related papers (2021-10-05T15:04:01Z) - No-Regret Reinforcement Learning with Heavy-Tailed Rewards [11.715649997214125]
We show that the difficulty of learning heavy-tailed rewards dominates the difficulty of learning transition probabilities.
Our algorithms naturally generalize to deep reinforcement learning applications.
All of our algorithms outperform baselines on both synthetic MDPs and standard RL benchmarks.
arXiv Detail & Related papers (2021-02-25T10:25:57Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.