Related papers: BoxMask: Revisiting Bounding Box Supervision for Video Object Detection

BoxMask: Revisiting Bounding Box Supervision for Video Object Detection

URL: http://arxiv.org/abs/2210.06008v1
Date: Wed, 12 Oct 2022 08:25:27 GMT
Title: BoxMask: Revisiting Bounding Box Supervision for Video Object Detection
Authors: Khurram Azeem Hashmi, Alain Pagani, Didier Stricker, Muhammamd Zeshan Afzal
Abstract summary: We propose BoxMask, which learns discriminative representations by incorporating class-aware pixel-level information. The proposed module can be effortlessly integrated into any region-based detector to boost detection.
Score: 11.255962936937744
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present a new, simple yet effective approach to uplift video object detection. We observe that prior works operate on instance-level feature aggregation that imminently neglects the refined pixel-level representation, resulting in confusion among objects sharing similar appearance or motion characteristics. To address this limitation, we propose BoxMask, which effectively learns discriminative representations by incorporating class-aware pixel-level information. We simply consider bounding box-level annotations as a coarse mask for each object to supervise our method. The proposed module can be effortlessly integrated into any region-based detector to boost detection. Extensive experiments on ImageNet VID and EPIC KITCHENS datasets demonstrate consistent and significant improvement when we plug our BoxMask module into numerous recent state-of-the-art methods.

Related papers

Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection [12.417754433715903]
We present FAIM, a new VOD method that enhances temporal Feature Aggregation by leveraging Instance Mask features. Using YOLOX as a base detector, FAIM achieves 87.9% mAP on the ImageNet VID dataset at 33 FPS on a single 2080Ti GPU.
arXiv Detail & Related papers (2024-12-06T10:12:10Z)
Learning Referring Video Object Segmentation from Weak Annotation [78.45828085350936]
Referring video object segmentation (RVOS) is a task that aims to segment the target object in all video frames based on a sentence describing the object. We propose a new annotation scheme that reduces the annotation effort by 8 times, while providing sufficient supervision for RVOS. Our scheme only requires a mask for the frame where the object first appears and bounding boxes for the rest of the frames.
arXiv Detail & Related papers (2023-08-04T06:50:52Z)
High-resolution Iterative Feedback Network for Camouflaged Object Detection [128.893782016078]
Spotting camouflaged objects that are visually assimilated into the background is tricky for object detection algorithms. We aim to extract the high-resolution texture details to avoid the detail degradation that causes blurred vision in edges and boundaries. We introduce a novel HitNet to refine the low-resolution representations by high-resolution features in an iterative feedback manner.
arXiv Detail & Related papers (2022-03-22T11:20:21Z)
Oriented Bounding Boxes for Small and Freely Rotated Objects [7.6997148655751895]
A novel object detection method is presented that handles freely rotated objects of arbitrary sizes. The method encodes the precise location and orientation of features of the target objects at grid cell locations. Evaluations on the xView and DOTA datasets show that the proposed method uniformly improves performance over existing state-of-the-art methods.
arXiv Detail & Related papers (2021-04-24T02:04:49Z)
BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation [19.55647093153416]
Weakly supervised segmentation methods using bounding box annotations focus on obtaining a pixel-level mask from each box containing an object. In this work, we utilize higher-level information from the behavior of a trained object detector, by seeking the smallest areas of the image from which the object detector produces almost the same result as it does from the whole image. These areas constitute a bounding-box attribution map (BBAM), which identifies the target object in its bounding box and thus serves as pseudo ground-truth for weakly supervised semantic and COCO instance segmentation.
arXiv Detail & Related papers (2021-03-16T08:29:33Z)
Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization. We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning. Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z)
Ensembling object detectors for image and video data analysis [98.26061123111647]
We propose a method for ensembling the outputs of multiple object detectors for improving detection performance and precision of bounding boxes on image data. We extend it to video data by proposing a two-stage tracking-based scheme for detection refinement.
arXiv Detail & Related papers (2021-02-09T12:38:16Z)
Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos. We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks. The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z)
Weakly-Supervised Saliency Detection via Salient Object Subitizing [57.17613373230722]
We introduce saliency subitizing as the weak supervision since it is class-agnostic. This allows the supervision to be aligned with the property of saliency detection. We conduct extensive experiments on five benchmark datasets.
arXiv Detail & Related papers (2021-01-04T12:51:45Z)
Learning Object Scale With Click Supervision for Object Detection [29.421113887739413]
We propose a simple yet effective method which incorporatesCNN visualization with click supervision to generate the pseudoground-truths. These pseudo ground-truthscans be used to train a fully-supervised detector. Experimental results on the PASCAL VOC2007 and VOC 2012 datasets show that the proposed methodcan obtain much higher accuracy for estimating the object scale.
arXiv Detail & Related papers (2020-02-20T03:59:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.