TIVE: A Toolbox for Identifying Video Instance Segmentation Errors
- URL: http://arxiv.org/abs/2210.08856v1
- Date: Mon, 17 Oct 2022 08:51:31 GMT
- Title: TIVE: A Toolbox for Identifying Video Instance Segmentation Errors
- Authors: Wenhe Jia, Lu Yang, Zilong Jia, Wenyi Zhao, Yilin Zhou, Qing Song
- Abstract summary: Video Instance Errors(VIS) task has attracted vast researchers' focus on architecture modeling to boost performance.
We introduce TIVE, a toolbox for identifying Video instance segmentation errors.
We conduct extensive experiments by the toolbox to further illustrate how spatial segmentation and temporal association affect each other.
- Score: 5.791075969487935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since first proposed, Video Instance Segmentation(VIS) task has attracted
vast researchers' focus on architecture modeling to boost performance. Though
great advances achieved in online and offline paradigms, there are still
insufficient means to identify model errors and distinguish discrepancies
between methods, as well approaches that correctly reflect models' performance
in recognizing object instances of various temporal lengths remain barely
available. More importantly, as the fundamental model abilities demanded by the
task, spatial segmentation and temporal association are still understudied in
both evaluation and interaction mechanisms. In this paper, we introduce TIVE, a
Toolbox for Identifying Video instance segmentation Errors. By directly
operating output prediction files, TIVE defines isolated error types and
weights each type's damage to mAP, for the purpose of distinguishing model
characters. By decomposing localization quality in spatial-temporal dimensions,
model's potential drawbacks on spatial segmentation and temporal association
can be revealed. TIVE can also report mAP over instance temporal length for
real applications. We conduct extensive experiments by the toolbox to further
illustrate how spatial segmentation and temporal association affect each other.
We expect the analysis of TIVE can give the researchers more insights, guiding
the community to promote more meaningful explorations for video instance
segmentation. The proposed toolbox is available at
https://github.com/wenhe-jia/TIVE.
Related papers
- Segment Anything Model for automated image data annotation: empirical studies using text prompts from Grounding DINO [0.0]
Grounding DINO and the Segment Anything Model (SAM) have achieved impressive performance in zero-shot object detection and image segmentation.
We show that false positive detections with appreciable confidence scores occupy large image areas and can usually be filtered by their relative sizes.
We also report significant improvements in segmentation performance and annotation time savings over manual approaches.
arXiv Detail & Related papers (2024-06-27T10:08:29Z) - Spatial-Temporal Multi-level Association for Video Object Segmentation [89.32226483171047]
This paper proposes spatial-temporal multi-level association, which jointly associates reference frame, test frame, and object features.
Specifically, we construct a spatial-temporal multi-level feature association module to learn better target-aware features.
arXiv Detail & Related papers (2024-04-09T12:44:34Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.