Abstract: Large-scale Web-based services present opportunities for improving UI
policies based on observed user interactions. We address challenges of learning
such policies through model-free offline Reinforcement Learning (RL) with
off-policy training. Deployed in a production system for user authentication in
a major social network, it significantly improves long-term objectives. We
articulate practical challenges, compare several ML techniques, provide
insights on training and evaluation of RL models, and discuss generalizations.