Abstract: Video relation detection problem refers to the detection of the relationship
between different objects in videos, such as spatial relationship and action
relationship. In this paper, we present video relation detection with
trajectory-aware multi-modal features to solve this task.
Considering the complexity of doing visual relation detection in videos, we
decompose this task into three sub-tasks: object detection, trajectory proposal
and relation prediction. We use the state-of-the-art object detection method to
ensure the accuracy of object trajectory detection and multi-modal feature
representation to help the prediction of relation between objects. Our method
won the first place on the video relation detection task of Video Relation
Understanding Grand Challenge in ACM Multimedia 2020 with 11.74\% mAP, which
surpasses other methods by a large margin.