Abstract: Consider a prosthetic arm, learning to adapt to its user's control signals.
We propose Interaction-Grounded Learning for this novel setting, in which a
learner's goal is to interact with the environment with no grounding or
explicit reward to optimize its policies. Such a problem evades common RL
solutions which require an explicit reward. The learning agent observes a
multidimensional context vector, takes an action, and then observes a
multidimensional feedback vector. This multidimensional feedback vector has no
explicit reward information. In order to succeed, the algorithm must learn how
to evaluate the feedback vector to discover a latent reward signal, with which
it can ground its policies without supervision. We show that in an
Interaction-Grounded Learning setting, with certain natural assumptions, a
learner can discover the latent reward and ground its policy for successful
interaction. We provide theoretical guarantees and a proof-of-concept empirical
evaluation to demonstrate the effectiveness of our proposed approach.