Human Instance Segmentation and Tracking via Data Association and
Single-stage Detector
- URL: http://arxiv.org/abs/2203.16966v1
- Date: Thu, 31 Mar 2022 11:36:09 GMT
- Title: Human Instance Segmentation and Tracking via Data Association and
Single-stage Detector
- Authors: Lu Cheng and Mingbo Zhao
- Abstract summary: Human video instance segmentation plays an important role in computer understanding of human activities.
Most current VIS methods are based on Mask-RCNN framework.
We develop a new method for human video instance segmentation based on single-stage detector.
- Score: 17.46922710432633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human video instance segmentation plays an important role in computer
understanding of human activities and is widely used in video processing, video
surveillance, and human modeling in virtual reality. Most current VIS methods
are based on Mask-RCNN framework, where the target appearance and motion
information for data matching will increase computational cost and have an
impact on segmentation real-time performance; on the other hand, the existing
datasets for VIS focus less on all the people appearing in the video. In this
paper, to solve the problems, we develop a new method for human video instance
segmentation based on single-stage detector. To tracking the instance across
the video, we have adopted data association strategy for matching the same
instance in the video sequence, where we jointly learn target instance
appearances and their affinities in a pair of video frames in an end-to-end
fashion. We have also adopted the centroid sampling strategy for enhancing the
embedding extraction ability of instance, which is to bias the instance
position to the inside of each instance mask with heavy overlap condition. As a
result, even there exists a sudden change in the character activity, the
instance position will not move out of the mask, so that the problem that the
same instance is represented by two different instances can be alleviated.
Finally, we collect PVIS dataset by assembling several video instance
segmentation datasets to fill the gap of the current lack of datasets dedicated
to human video segmentation. Extensive simulations based on such dataset has
been conduct. Simulation results verify the effectiveness and efficiency of the
proposed work.
Related papers
- Context-Aware Video Instance Segmentation [12.71520768233772]
We introduce the Context-Aware Video Instance (CAVIS), a novel framework designed to enhance instance association.
We propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features to improve tracking accuracy.
We also introduce the Prototypical Cross-frame Contrastive (PCC) loss, which ensures consistency in object-level features across frames.
arXiv Detail & Related papers (2024-07-03T11:11:16Z) - Training-Free Robust Interactive Video Object Segmentation [82.05906654403684]
We propose a training-free prompt tracking framework for interactive video object segmentation (I-PT)
We jointly adopt sparse points and boxes tracking, filtering out unstable points and capturing object-wise information.
Our framework has demonstrated robust zero-shot video segmentation results on popular VOS datasets.
arXiv Detail & Related papers (2024-06-08T14:25:57Z) - Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - End-to-end video instance segmentation via spatial-temporal graph neural
networks [30.748756362692184]
Video instance segmentation is a challenging task that extends image instance segmentation to the video domain.
Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step.
We propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation.
arXiv Detail & Related papers (2022-03-07T05:38:08Z) - Reliable Shot Identification for Complex Event Detection via
Visual-Semantic Embedding [72.9370352430965]
We propose a visual-semantic guided loss method for event detection in videos.
Motivated by curriculum learning, we introduce a negative elastic regularization term to start training the classifier with instances of high reliability.
An alternative optimization algorithm is developed to solve the proposed challenging non-net regularization problem.
arXiv Detail & Related papers (2021-10-12T11:46:56Z) - 1st Place Solution for YouTubeVOS Challenge 2021:Video Instance
Segmentation [0.39146761527401414]
Video Instance (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously.
We propose two modules, named Temporally Correlated Instance (TCIS) and Bidirectional Tracking (BiTrack)
By combining these techniques with a bag of tricks, the network performance is significantly boosted compared to the baseline.
arXiv Detail & Related papers (2021-06-12T00:20:38Z) - CompFeat: Comprehensive Feature Aggregation for Video Instance
Segmentation [67.17625278621134]
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video.
Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects.
We propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information.
arXiv Detail & Related papers (2020-12-07T00:31:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.