Abstract: DNNs are becoming less and less over-parametrised due to recent advances in
efficient model design, through careful hand-crafted or NAS-based methods.
Relying on the fact that not all inputs require the same amount of computation
to yield a confident prediction, adaptive inference is gaining attention as a
prominent approach for pushing the limits of efficient deployment.
Particularly, early-exit networks comprise an emerging direction for tailoring
the computation depth of each input sample at runtime, offering complementary
performance gains to other efficiency optimisations. In this paper, we
decompose the design methodology of early-exit networks to its key components
and survey the recent advances in each one of them. We also position
early-exiting against other efficient inference solutions and provide our
insights on the current challenges and most promising future directions for
research in the field.