ScribbleBox: Interactive Annotation Framework for Video Object
Segmentation
- URL: http://arxiv.org/abs/2008.09721v1
- Date: Sat, 22 Aug 2020 00:33:10 GMT
- Title: ScribbleBox: Interactive Annotation Framework for Video Object
Segmentation
- Authors: Bowen Chen, Huan Ling, Xiaohui Zeng, Gao Jun, Ziyue Xu, Sanja Fidler
- Abstract summary: We introduce ScribbleBox, a novel interactive framework for annotating object instances with masks in videos.
Box tracks are annotated efficiently by approximating the trajectory using a parametric curve.
We show that our ScribbleBox approach reaches 88.92% J&F on DAVIS 2017 with 9.14 clicks per box track, and 4 frames of annotation.
- Score: 62.86341611684222
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Manually labeling video datasets for segmentation tasks is extremely time
consuming. In this paper, we introduce ScribbleBox, a novel interactive
framework for annotating object instances with masks in videos. In particular,
we split annotation into two steps: annotating objects with tracked boxes, and
labeling masks inside these tracks. We introduce automation and interaction in
both steps. Box tracks are annotated efficiently by approximating the
trajectory using a parametric curve with a small number of control points which
the annotator can interactively correct. Our approach tolerates a modest amount
of noise in the box placements, thus typically only a few clicks are needed to
annotate tracked boxes to a sufficient accuracy. Segmentation masks are
corrected via scribbles which are efficiently propagated through time. We show
significant performance gains in annotation efficiency over past work. We show
that our ScribbleBox approach reaches 88.92% J&F on DAVIS2017 with 9.14 clicks
per box track, and 4 frames of scribble annotation.
Related papers
- Learning Tracking Representations from Single Point Annotations [49.47550029470299]
We propose to learn tracking representations from single point annotations in a weakly supervised manner.
Specifically, we propose a soft contrastive learning framework that incorporates target objectness prior to end-to-end contrastive learning.
arXiv Detail & Related papers (2024-04-15T06:50:58Z) - Learning Referring Video Object Segmentation from Weak Annotation [78.45828085350936]
Referring video object segmentation (RVOS) is a task that aims to segment the target object in all video frames based on a sentence describing the object.
We propose a new annotation scheme that reduces the annotation effort by 8 times, while providing sufficient supervision for RVOS.
Our scheme only requires a mask for the frame where the object first appears and bounding boxes for the rest of the frames.
arXiv Detail & Related papers (2023-08-04T06:50:52Z) - Box2Mask: Weakly Supervised 3D Semantic Instance Segmentation Using
Bounding Boxes [38.60444957213202]
We look at weakly-supervised 3D semantic instance segmentation.
Key idea is to leverage 3D bounding box labels which are easier and faster to annotate.
We show that it is possible to train dense segmentation models using only bounding box labels.
arXiv Detail & Related papers (2022-06-02T17:59:57Z) - Robust Visual Tracking by Segmentation [103.87369380021441]
Estimating the target extent poses a fundamental challenge in visual object tracking.
We propose a segmentation-centric tracking pipeline that produces a highly accurate segmentation mask.
Our tracker is able to better learn a target representation that clearly differentiates the target in the scene from background content.
arXiv Detail & Related papers (2022-03-21T17:59:19Z) - Heuristics2Annotate: Efficient Annotation of Large-Scale Marathon
Dataset For Bounding Box Regression [8.078491757252692]
We collect a novel large-scale in-the-wild video dataset of marathon runners.
The dataset consists of hours of recording of thousands of runners captured using 42 hand-held smartphone cameras.
We propose a new scheme for tackling the challenges in the annotation of such large dataset.
arXiv Detail & Related papers (2021-04-06T19:08:31Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Efficient video annotation with visual interpolation and frame selection
guidance [0.0]
We introduce a unified framework for generic video annotation with bounding boxes.
We show that our approach reduces actual measured annotation time by 50% compared to commonly used linear methods.
arXiv Detail & Related papers (2020-12-23T09:31:40Z) - Reducing the Annotation Effort for Video Object Segmentation Datasets [50.893073670389164]
densely labeling every frame with pixel masks does not scale to large datasets.
We use a deep convolutional network to automatically create pseudo-labels on a pixel level from much cheaper bounding box annotations.
We obtain the new TAO-VOS benchmark, which we make publicly available at www.vision.rwth-aachen.de/page/taovos.
arXiv Detail & Related papers (2020-11-02T17:34:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.