Abstract: In this paper, we propose a method for ensembling the outputs of multiple
object detectors for improving detection performance and precision of bounding
boxes on image data. We further extend it to video data by proposing a
two-stage tracking-based scheme for detection refinement. The proposed method
can be used as a standalone approach for improving object detection
performance, or as a part of a framework for faster bounding box annotation in
unseen datasets, assuming that the objects of interest are those present in
some common public datasets.