Heuristics2Annotate: Efficient Annotation of Large-Scale Marathon
Dataset For Bounding Box Regression
- URL: http://arxiv.org/abs/2104.02749v1
- Date: Tue, 6 Apr 2021 19:08:31 GMT
- Title: Heuristics2Annotate: Efficient Annotation of Large-Scale Marathon
Dataset For Bounding Box Regression
- Authors: Pranjal Singh Rajput, Yeshwanth Napolean, Jan van Gemert
- Abstract summary: We collect a novel large-scale in-the-wild video dataset of marathon runners.
The dataset consists of hours of recording of thousands of runners captured using 42 hand-held smartphone cameras.
We propose a new scheme for tackling the challenges in the annotation of such large dataset.
- Score: 8.078491757252692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Annotating a large-scale in-the-wild person re-identification dataset
especially of marathon runners is a challenging task. The variations in the
scenarios such as camera viewpoints, resolution, occlusion, and illumination
make the problem non-trivial. Manually annotating bounding boxes in such
large-scale datasets is cost-inefficient. Additionally, due to crowdedness and
occlusion in the videos, aligning the identity of runners across multiple
disjoint cameras is a challenge. We collected a novel large-scale in-the-wild
video dataset of marathon runners. The dataset consists of hours of recording
of thousands of runners captured using 42 hand-held smartphone cameras and
covering real-world scenarios. Due to the presence of crowdedness and occlusion
in the videos, the annotation of runners becomes a challenging task. We propose
a new scheme for tackling the challenges in the annotation of such large
dataset. Our technique reduces the overall cost of annotation in terms of time
as well as budget. We demonstrate performing fps analysis to reduce the effort
and time of annotation. We investigate several annotation methods for
efficiently generating tight bounding boxes. Our results prove that
interpolating bounding boxes between keyframes is the most efficient method of
bounding box generation amongst several other methods and is 3x times faster
than the naive baseline method. We introduce a novel way of aligning the
identity of runners in disjoint cameras. Our inter-camera alignment tool
integrated with the state-of-the-art person re-id system proves to be
sufficient and effective in the alignment of the runners across multiple
cameras with non-overlapping views. Our proposed framework of annotation
reduces the annotation cost of the dataset by a factor of 16x, also effectively
aligning 93.64% of the runners in the cross-camera setting.
Related papers
- Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - Are Dense Labels Always Necessary for 3D Object Detection from Point
Cloud? [72.40353149833109]
Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training.
We propose a novel sparsely-annotated framework, in which we just annotate one 3D object per scene.
We develop the SS3D++ method that alternatively improves 3D detector training and confident fully-annotated scene generation.
arXiv Detail & Related papers (2024-03-05T09:38:11Z) - Implicit View-Time Interpolation of Stereo Videos using Multi-Plane
Disparities and Non-Uniform Coordinates [10.445563506186307]
We build upon X-Fields that approximates an interpolatable mapping between the input coordinates and 2D RGB images.
We propose multi-plane disparities to reduce the spatial distance of the objects in the stereo views.
We additionally introduce several simple, but important, improvements over X-Fields.
arXiv Detail & Related papers (2023-03-30T06:32:55Z) - Cross-Camera Trajectories Help Person Retrieval in a Camera Network [124.65912458467643]
Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network.
We propose a pedestrian retrieval framework based on cross-camera generation, which integrates both temporal and spatial information.
To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset.
arXiv Detail & Related papers (2022-04-27T13:10:48Z) - Fast Interactive Video Object Segmentation with Graph Neural Networks [0.0]
We present a graph neural network based approach for tackling the problem of interactive video object segmentation.
Our network operates on superpixel-graphs which allow us to reduce the dimensionality of the problem by several magnitudes.
arXiv Detail & Related papers (2021-03-05T17:37:12Z) - Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in
Videos [159.02703673838639]
We introduce a method for generating segmentation masks from per-frame bounding box annotations in videos.
We use our resulting accurate masks for weakly supervised training of video object segmentation (VOS) networks.
The additional data provides substantially better generalization performance leading to state-of-the-art results in both the VOS and more challenging tracking domain.
arXiv Detail & Related papers (2021-01-06T18:56:24Z) - Efficient video annotation with visual interpolation and frame selection
guidance [0.0]
We introduce a unified framework for generic video annotation with bounding boxes.
We show that our approach reduces actual measured annotation time by 50% compared to commonly used linear methods.
arXiv Detail & Related papers (2020-12-23T09:31:40Z) - Reducing the Annotation Effort for Video Object Segmentation Datasets [50.893073670389164]
densely labeling every frame with pixel masks does not scale to large datasets.
We use a deep convolutional network to automatically create pseudo-labels on a pixel level from much cheaper bounding box annotations.
We obtain the new TAO-VOS benchmark, which we make publicly available at www.vision.rwth-aachen.de/page/taovos.
arXiv Detail & Related papers (2020-11-02T17:34:45Z) - ScribbleBox: Interactive Annotation Framework for Video Object
Segmentation [62.86341611684222]
We introduce ScribbleBox, a novel interactive framework for annotating object instances with masks in videos.
Box tracks are annotated efficiently by approximating the trajectory using a parametric curve.
We show that our ScribbleBox approach reaches 88.92% J&F on DAVIS 2017 with 9.14 clicks per box track, and 4 frames of annotation.
arXiv Detail & Related papers (2020-08-22T00:33:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.