Deep Learning Techniques for Video Instance Segmentation: A Survey
- URL: http://arxiv.org/abs/2310.12393v1
- Date: Thu, 19 Oct 2023 00:27:30 GMT
- Title: Deep Learning Techniques for Video Instance Segmentation: A Survey
- Authors: Chenhao Xu, Chang-Tsun Li, Yongjian Hu, Chee Peng Lim, Douglas
Creighton
- Abstract summary: Video instance segmentation is an emerging computer vision research area introduced in 2019.
Deep-learning techniques take a dominant role in various computer vision areas.
This survey offers a multifaceted view of deep-learning schemes for video instance segmentation.
- Score: 19.32547752428875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video instance segmentation, also known as multi-object tracking and
segmentation, is an emerging computer vision research area introduced in 2019,
aiming at detecting, segmenting, and tracking instances in videos
simultaneously. By tackling the video instance segmentation tasks through
effective analysis and utilization of visual information in videos, a range of
computer vision-enabled applications (e.g., human action recognition, medical
image processing, autonomous vehicle navigation, surveillance, etc) can be
implemented. As deep-learning techniques take a dominant role in various
computer vision areas, a plethora of deep-learning-based video instance
segmentation schemes have been proposed. This survey offers a multifaceted view
of deep-learning schemes for video instance segmentation, covering various
architectural paradigms, along with comparisons of functional performance,
model complexity, and computational overheads. In addition to the common
architectural designs, auxiliary techniques for improving the performance of
deep-learning models for video instance segmentation are compiled and
discussed. Finally, we discuss a range of major challenges and directions for
further investigations to help advance this promising research field.
Related papers
- Video Summarization Techniques: A Comprehensive Review [1.6381055567716192]
The paper explores the various approaches and methods created for video summarizing, emphasizing both abstractive and extractive strategies.
The process of extractive summarization involves the identification of key frames or segments from the source video, utilizing methods such as shot boundary recognition, and clustering.
On the other hand, abstractive summarization creates new content by getting the essential content from the video, using machine learning models like deep neural networks and natural language processing, reinforcement learning, attention mechanisms, generative adversarial networks, and multi-modal learning.
arXiv Detail & Related papers (2024-10-06T11:17:54Z) - Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention [29.62044843067169]
Video object segmentation is a fundamental research problem in computer vision.
We propose a new method for self-supervised video object segmentation based on distillation learning of deformable attention.
arXiv Detail & Related papers (2024-01-25T04:39:48Z) - CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking
and Segmentation [31.167405688707575]
We propose a framework for instance-level visual analysis on video frames.
It can simultaneously conduct object detection, instance segmentation, and multi-object tracking.
We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets.
arXiv Detail & Related papers (2023-11-02T04:32:24Z) - Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
arXiv Detail & Related papers (2023-10-31T20:15:40Z) - Learning Visual Affordance Grounding from Demonstration Videos [76.46484684007706]
Affordance grounding aims to segment all possible interaction regions between people and objects from an image/video.
We propose a Hand-aided Affordance Grounding Network (HAGNet) that leverages the aided clues provided by the position and action of the hand in demonstration videos.
arXiv Detail & Related papers (2021-08-12T11:45:38Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z) - Motion-supervised Co-Part Segmentation [88.40393225577088]
We propose a self-supervised deep learning method for co-part segmentation.
Our approach develops the idea that motion information inferred from videos can be leveraged to discover meaningful object parts.
arXiv Detail & Related papers (2020-04-07T09:56:45Z) - Image Segmentation Using Deep Learning: A Survey [58.37211170954998]
Image segmentation is a key topic in image processing and computer vision.
There has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models.
arXiv Detail & Related papers (2020-01-15T21:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.