Abstract: In recent years, $Q$-learning has become indispensable for model-free
reinforcement learning (MFRL). However, it suffers from well-known problems
such as under- and overestimation bias of the value, which may adversely affect
the policy learning. To resolve this issue, we propose a MFRL framework that is
augmented with the components of model-based RL. Specifically, we propose to
estimate not only the $Q$-values but also both the transition and the reward
with a shared network. We further utilize the estimated reward from the model
estimators for $Q$-learning, which promotes interaction between the estimators.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL),
obtains a policy-invariant solution which is identical to the solution obtained
by learning with true reward. Finally, we also provide a trick to prioritize
past experiences in the replay buffer by utilizing model-estimation errors. We
experimentally validate MQL built upon state-of-the-art off-policy MFRL
methods, and show that MQL largely improves their performance and convergence.
The proposed scheme is simple to implement and does not require additional