Joint Inductive and Transductive Learning for Video Object Segmentation
- URL: http://arxiv.org/abs/2108.03679v1
- Date: Sun, 8 Aug 2021 16:25:48 GMT
- Title: Joint Inductive and Transductive Learning for Video Object Segmentation
- Authors: Yunyao Mao, Ning Wang, Wengang Zhou, Houqiang Li
- Abstract summary: Semi-supervised object segmentation is a task of segmenting the target object in a video sequence given only a mask in the first frame.
Most previous best-performing methods adopt matching-based transductive reasoning or online inductive learning.
We propose to integrate transductive and inductive learning into a unified framework to exploit complement between them for accurate and robust video object segmentation.
- Score: 107.32760625159301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semi-supervised video object segmentation is a task of segmenting the target
object in a video sequence given only a mask annotation in the first frame. The
limited information available makes it an extremely challenging task. Most
previous best-performing methods adopt matching-based transductive reasoning or
online inductive learning. Nevertheless, they are either less discriminative
for similar instances or insufficient in the utilization of spatio-temporal
information. In this work, we propose to integrate transductive and inductive
learning into a unified framework to exploit the complementarity between them
for accurate and robust video object segmentation. The proposed approach
consists of two functional branches. The transduction branch adopts a
lightweight transformer architecture to aggregate rich spatio-temporal cues
while the induction branch performs online inductive learning to obtain
discriminative target information. To bridge these two diverse branches, a
two-head label encoder is introduced to learn the suitable target prior for
each of them. The generated mask encodings are further forced to be
disentangled to better retain their complementarity. Extensive experiments on
several prevalent benchmarks show that, without the need of synthetic training
data, the proposed approach sets a series of new state-of-the-art records. Code
is available at https://github.com/maoyunyao/JOINT.
Related papers
- Towards Few-Annotation Learning in Computer Vision: Application to Image
Classification and Object Detection tasks [3.5353632767823506]
In this thesis, we develop theoretical, algorithmic and experimental contributions for Machine Learning with limited labels.
In a first contribution, we are interested in bridging the gap between theory and practice for popular Meta-Learning algorithms used in Few-Shot Classification.
To leverage unlabeled data when training object detectors based on the Transformer architecture, we propose both an unsupervised pretraining and a semi-supervised learning method.
arXiv Detail & Related papers (2023-11-08T18:50:04Z) - LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and
Bootstrapped Self-training [13.985488693082981]
We propose a self-supervised object discovery approach that leverages motion and appearance information to produce high-quality object segmentation masks.
We demonstrate the effectiveness of our approach, named LOCATE, on multiple standard video object segmentation, image saliency detection, and object segmentation benchmarks.
arXiv Detail & Related papers (2023-08-22T07:27:09Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Learning to Learn Better for Video Object Segmentation [94.5753973590207]
We propose a novel framework that emphasizes Learning to Learn Better (LLB) target features for SVOS.
We design the discriminative label generation module (DLGM) and the adaptive fusion module to address these issues.
Our proposed LLB method achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-12-05T09:10:34Z) - Self-supervised Amodal Video Object Segmentation [57.929357732733926]
Amodal perception requires inferring the full shape of an object that is partially occluded.
This paper develops a new framework of amodal Video object segmentation (SaVos)
arXiv Detail & Related papers (2022-10-23T14:09:35Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z) - Few-Shot Action Recognition with Compromised Metric via Optimal
Transport [31.834843714684343]
Few-shot action recognition is still not mature despite the wide research of few-shot image classification.
One main obstacle to applying these algorithms in action recognition is the complex structure of videos.
We propose Compromised Metric via Optimal Transport (CMOT) to combine the advantages of these two solutions.
arXiv Detail & Related papers (2021-04-08T12:42:05Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.