Abstract: Knowledge distillation constitutes a simple yet effective way to improve the
performance of a compact student network by exploiting the knowledge of a more
powerful teacher. Nevertheless, the knowledge distillation literature remains
limited to the scenario where the student and the teacher tackle the same task.
Here, we investigate the problem of transferring knowledge not only across
architectures but also across tasks. To this end, we study the case of object
detection and, instead of following the standard detector-to-detector
distillation approach, introduce a classifier-to-detector knowledge transfer
framework. In particular, we propose strategies to exploit the classification
teacher to improve both the detector's recognition accuracy and localization
performance. Our experiments on several detectors with different backbones
demonstrate the effectiveness of our approach, allowing us to outperform the
state-of-the-art detector-to-detector distillation methods.