Abstract: As deep networks begin to be deployed as autonomous agents, the issue of how
they can communicate with each other becomes important. Here, we train two deep
nets from scratch to perform realistic referent identification through
unsupervised emergent communication. We show that the largely interpretable
emergent protocol allows the nets to successfully communicate even about object
types they did not see at training time. The visual representations induced as
a by-product of our training regime, moreover, show comparable quality, when
re-used as generic visual features, to a recent self-supervised learning model.
Our results provide concrete evidence of the viability of (interpretable)
emergent deep net communication in a more realistic scenario than previously
considered, as well as establishing an intriguing link between this field and
self-supervised visual learning.