Effective Reinforcement Learning through Evolutionary Surrogate-Assisted
Prescription
- URL: http://arxiv.org/abs/2002.05368v2
- Date: Wed, 22 Apr 2020 03:27:46 GMT
- Title: Effective Reinforcement Learning through Evolutionary Surrogate-Assisted
Prescription
- Authors: Olivier Francon, Santiago Gonzalez, Babak Hodjat, Elliot Meyerson,
Risto Miikkulainen, Xin Qiu, and Hormoz Shahrzad
- Abstract summary: This paper introduces a general such approach, called Evolutionary Surrogate-Assisted Prescription, or ESP.
ESP forms a promising foundation to decision optimization in real-world problems.
- Score: 18.547387505708485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is now significant historical data available on decision making in
organizations, consisting of the decision problem, what decisions were made,
and how desirable the outcomes were. Using this data, it is possible to learn a
surrogate model, and with that model, evolve a decision strategy that optimizes
the outcomes. This paper introduces a general such approach, called
Evolutionary Surrogate-Assisted Prescription, or ESP. The surrogate is, for
example, a random forest or a neural network trained with gradient descent, and
the strategy is a neural network that is evolved to maximize the predictions of
the surrogate model. ESP is further extended in this paper to sequential
decision-making tasks, which makes it possible to evaluate the framework in
reinforcement learning (RL) benchmarks. Because the majority of evaluations are
done on the surrogate, ESP is more sample efficient, has lower variance, and
lower regret than standard RL approaches. Surprisingly, its solutions are also
better because both the surrogate and the strategy network regularize the
decision-making behavior. ESP thus forms a promising foundation to decision
optimization in real-world problems.
Related papers
- Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization [6.713974813995327]
We present MEMENTO, an approach that leverages memory to improve the adaptation of neural solvers at time.
We successfully train all RL auto-regressive solvers on large instances, and show that MEMENTO can scale and is data-efficient.
Overall, MEMENTO enables to push the state-of-the-art on 11 out of 12 evaluated tasks.
arXiv Detail & Related papers (2024-06-24T08:18:19Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - On the Robustness of Decision-Focused Learning [0.0]
Decision-Focused Learning (DFL) is an emerging learning paradigm that tackles the task of training a machine learning (ML) model to predict missing parameters of an incomplete optimization problem, where the missing parameters are predicted.
DFL trains an ML model in an end-to-end system, by integrating the prediction and optimization tasks, providing better alignment of the training and testing objectives.
arXiv Detail & Related papers (2023-11-28T04:34:04Z) - Rethinking Decision Transformer via Hierarchical Reinforcement Learning [54.3596066989024]
Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL)
We introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL.
We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices.
arXiv Detail & Related papers (2023-11-01T03:32:13Z) - Optimal Sequential Decision-Making in Geosteering: A Reinforcement
Learning Approach [0.0]
Trajectory adjustment decisions throughout the drilling process, called geosteering, affect subsequent choices and information gathering.
We use the Deep Q-Network (DQN) method, a model-free reinforcement learning (RL) method that learns directly from the decision environment.
For two previously published synthetic geosteering scenarios, our results show that RL achieves high-quality outcomes comparable to the quasi-optimal ADP.
arXiv Detail & Related papers (2023-10-07T10:49:30Z) - Reinforcement Learning from Diverse Human Preferences [68.4294547285359]
This paper develops a method for crowd-sourcing preference labels and learning from diverse human preferences.
The proposed method is tested on a variety of tasks in DMcontrol and Meta-world.
It has shown consistent and significant improvements over existing preference-based RL algorithms when learning from diverse feedback.
arXiv Detail & Related papers (2023-01-27T15:18:54Z) - Data-Driven Offline Decision-Making via Invariant Representation
Learning [97.49309949598505]
offline data-driven decision-making involves synthesizing optimized decisions with no active interaction.
A key challenge is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appears erroneously good.
In this paper, we formulate offline data-driven decision-making as domain adaptation, where the goal is to make accurate predictions for the value of optimized decisions.
arXiv Detail & Related papers (2022-11-21T11:01:37Z) - Limitations of a proposed correction for slow drifts in decision
criterion [0.0]
We propose a model-based approach for disambiguating systematic updates from random drifts.
We show that this approach accurately recovers the latent trajectory of drifts in decision criterion.
Our results highlight the advantages of incorporating assumptions about the generative process directly into models of decision-making.
arXiv Detail & Related papers (2022-05-22T19:33:19Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - Post-hoc loss-calibration for Bayesian neural networks [25.05373000435213]
We develop methods for correcting approximate posterior predictive distributions encouraging them to prefer high-utility decisions.
In contrast to previous work, our approach is agnostic to the choice of the approximate inference algorithm.
arXiv Detail & Related papers (2021-06-13T13:53:27Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.