VideoSSL: Semi-Supervised Learning for Video Classification
- URL: http://arxiv.org/abs/2003.00197v1
- Date: Sat, 29 Feb 2020 07:13:12 GMT
- Title: VideoSSL: Semi-Supervised Learning for Video Classification
- Authors: Longlong Jing, Toufiq Parag, Zhe Wu, Yingli Tian, Hongcheng Wang
- Abstract summary: We propose a semi-supervised learning approach for video classification, VideoSSL, using convolutional neural networks (CNN)
To minimize the dependence on a large annotated dataset, our proposed method trains from a small number of labeled examples.
We show that, under the supervision of these guiding signals from unlabeled examples, a video classification CNN can achieve impressive performances.
- Score: 30.348819309923098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a semi-supervised learning approach for video classification,
VideoSSL, using convolutional neural networks (CNN). Like other computer vision
tasks, existing supervised video classification methods demand a large amount
of labeled data to attain good performance. However, annotation of a large
dataset is expensive and time consuming. To minimize the dependence on a large
annotated dataset, our proposed semi-supervised method trains from a small
number of labeled examples and exploits two regulatory signals from unlabeled
data. The first signal is the pseudo-labels of unlabeled examples computed from
the confidences of the CNN being trained. The other is the normalized
probabilities, as predicted by an image classifier CNN, that captures the
information about appearances of the interesting objects in the video. We show
that, under the supervision of these guiding signals from unlabeled examples, a
video classification CNN can achieve impressive performances utilizing a small
fraction of annotated examples on three publicly available datasets: UCF101,
HMDB51 and Kinetics.
Related papers
- Granular-ball Representation Learning for Deep CNN on Learning with Label Noise [14.082510085545582]
We propose a general granular-ball computing (GBC) module that can be embedded into a CNN model.
In this study, we split the input samples as $gb$ samples at feature-level, each of which can correspond to multiple samples with varying numbers and share one single label.
Experiments demonstrate that the proposed method can improve the robustness of CNN models with no additional data or optimization.
arXiv Detail & Related papers (2024-09-05T05:18:31Z) - Efficient labeling of solar flux evolution videos by a deep learning
model [0.0]
We show that convolutional neural networks (CNNs) can be leveraged to improve the quality of data labeling.
We train CNNs using crude labels, manually verify, correct labeling vs. CNN disagreements, and repeat this process until convergence.
We find that a high-quality labeled dataset, derived through this iterative process, reduces the necessary manual verification by 50%.
arXiv Detail & Related papers (2023-08-29T02:05:40Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy
Labels [33.659146748289444]
We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information.
We show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets.
arXiv Detail & Related papers (2021-10-13T16:12:18Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - Weakly Supervised Instance Segmentation for Videos with Temporal Mask
Consistency [28.352140544936198]
Weakly supervised instance segmentation reduces the cost of annotations required to train models.
We show that these issues can be better addressed by training with weakly labeled videos instead of images.
We are the first to explore the use of these video signals to tackle weakly supervised instance segmentation.
arXiv Detail & Related papers (2021-03-23T23:20:46Z) - Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision.
The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr.
This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z) - Semi-supervised deep learning based on label propagation in a 2D
embedded space [117.9296191012968]
Proposed solutions propagate labels from a small set of supervised images to a large set of unsupervised ones to train a deep neural network model.
We present a loop in which a deep neural network (VGG-16) is trained from a set with more correctly labeled samples along iterations.
As the labeled set improves along iterations, it improves the features of the neural network.
arXiv Detail & Related papers (2020-08-02T20:08:54Z) - Labelling unlabelled videos from scratch with multi-modal
self-supervision [82.60652426371936]
unsupervised labelling of a video dataset does not come for free from strong feature encoders.
We propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations.
An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels.
arXiv Detail & Related papers (2020-06-24T12:28:17Z) - 3D medical image segmentation with labeled and unlabeled data using
autoencoders at the example of liver segmentation in CT images [58.720142291102135]
This work investigates the potential of autoencoder-extracted features to improve segmentation with a convolutional neural network.
A convolutional autoencoder was used to extract features from unlabeled data and a multi-scale, fully convolutional CNN was used to perform the target task of 3D liver segmentation in CT images.
arXiv Detail & Related papers (2020-03-17T20:20:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.