Machine Learning Simulates Agent-Based Model Towards Policy
- URL: http://arxiv.org/abs/2203.02576v1
- Date: Fri, 4 Mar 2022 21:19:11 GMT
- Title: Machine Learning Simulates Agent-Based Model Towards Policy
- Authors: Bernardo Alves Furtado and Gustavo Onofre Andre\~ao
- Abstract summary: We use a random forest machine learning algorithm to emulate an agent-based model (ABM) and evaluate competing policies across 46 Metropolitan Regions (MRs) in Brazil.
As a result, we obtain the optimal (and non-optimal) performance of each region over the policies.
Results suggest that MRs already have embedded structures that favor optimal or non-optimal results, but they also illustrate which policy is more beneficial to each place.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Public Policies are not intrinsically positive or negative. Rather, policies
provide varying levels of effects across different recipients.
Methodologically, computational modeling enables the application of a
combination of multiple influences on empirical data, thus allowing for
heterogeneous response to policies. We use a random forest machine learning
algorithm to emulate an agent-based model (ABM) and evaluate competing policies
across 46 Metropolitan Regions (MRs) in Brazil. In doing so, we use input
parameters and output indicators of 11,076 actual simulation runs and one
million emulated runs. As a result, we obtain the optimal (and non-optimal)
performance of each region over the policies. Optimum is defined as a
combination of production and inequality indicators for the full ensemble of
MRs. Results suggest that MRs already have embedded structures that favor
optimal or non-optimal results, but they also illustrate which policy is more
beneficial to each place. In addition to providing MR-specific policies'
results, the use of machine learning to simulate an ABM reduces the
computational burden, whereas allowing for a much larger variation among model
parameters. The coherence of results within the context of larger uncertainty
-- vis-\`a-vis those of the original ABM -- suggests an additional test of
robustness of the model. At the same time the exercise indicates which
parameters should policymakers intervene, in order to work towards optimum of
MRs.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Bayesian regularization of empirical MDPs [11.3458118258705]
We take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information.
We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store.
arXiv Detail & Related papers (2022-08-03T22:02:50Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Model-Based Policy Search Using Monte Carlo Gradient Estimation with
Real Systems Application [12.854118767247453]
We present a Model-Based Reinforcement Learning (MBRL) algorithm named emphMonte Carlo Probabilistic Inference for Learning COntrol (MC-PILCO)
The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient.
Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data efficiency and control performance.
arXiv Detail & Related papers (2021-01-28T17:01:15Z) - MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement
Learning [36.14516028564416]
This paper proposes an innovative Multiple Model Kalman Temporal Difference (MM-KTD) framework to learn optimal control policies.
An active learning method is proposed to enhance the sampling efficiency of the system.
Experimental results show superiority of the MM-KTD framework in comparison to its state-of-the-art counterparts.
arXiv Detail & Related papers (2020-05-30T06:39:55Z) - Mixed Reinforcement Learning with Additive Stochastic Uncertainty [19.229447330293546]
Reinforcement learning (RL) methods often rely on massive exploration data to search optimal policies, and suffer from poor sampling efficiency.
This paper presents a mixed RL algorithm by simultaneously using dual representations of environmental dynamics to search the optimal policy.
The effectiveness of the mixed RL is demonstrated by a typical optimal control problem of non-affine nonlinear systems.
arXiv Detail & Related papers (2020-02-28T08:02:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.