Related papers: Ensembles of Deep Neural Networks for Action Recognition in Still Images

Ensembles of Deep Neural Networks for Action Recognition in Still Images

URL: http://arxiv.org/abs/2003.09893v1
Date: Sun, 22 Mar 2020 13:44:09 GMT
Title: Ensembles of Deep Neural Networks for Action Recognition in Still Images
Authors: Sina Mohammadi, Sina Ghofrani Majelan, Shahriar B. Shokouhi
Abstract summary: We propose a transfer learning technique to tackle the lack of massive labeled action recognition datasets. We also use eight different pre-trained CNNs in our framework and investigate their performance on Stanford 40 dataset. The best setting of our method is able to achieve 93.17$%$ accuracy on the Stanford 40 dataset.
Score: 3.7900158137749336
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the fact that notable improvements have been made recently in the field of feature extraction and classification, human action recognition is still challenging, especially in images, in which, unlike videos, there is no motion. Thus, the methods proposed for recognizing human actions in videos cannot be applied to still images. A big challenge in action recognition in still images is the lack of large enough datasets, which is problematic for training deep Convolutional Neural Networks (CNNs) due to the overfitting issue. In this paper, by taking advantage of pre-trained CNNs, we employ the transfer learning technique to tackle the lack of massive labeled action recognition datasets. Furthermore, since the last layer of the CNN has class-specific information, we apply an attention mechanism on the output feature maps of the CNN to extract more discriminative and powerful features for classification of human actions. Moreover, we use eight different pre-trained CNNs in our framework and investigate their performance on Stanford 40 dataset. Finally, we propose using the Ensemble Learning technique to enhance the overall accuracy of action classification by combining the predictions of multiple models. The best setting of our method is able to achieve 93.17$\%$ accuracy on the Stanford 40 dataset.

Related papers

Perception Encoder: The best visual embeddings are not at the output of the network [70.86738083862099]
We introduce Perception (PE), a vision encoder for image and video understanding trained via simple vision-language learning. We find that contrastive vision-language training alone can produce strong, general embeddings for all of these downstream tasks. Together, our PE family of models achieves best-in-class results on a wide variety of tasks.
arXiv Detail & Related papers (2025-04-17T17:59:57Z)
Training Convolutional Neural Networks with the Forward-Forward algorithm [1.74440662023704]
Forward Forward (FF) algorithm has up to now only been used in fully connected networks. We show how the FF paradigm can be extended to CNNs. Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.16% on the MNIST hand-written digits dataset.
arXiv Detail & Related papers (2023-12-22T18:56:35Z)
Wild Animal Classifier Using CNN [0.0]
Convolution neural networks (CNNs) have multiple layers which have different weights for the purpose of prediction of a particular input. Image segmentation is one such widely used image processing method which provides a clear demarcation of the areas of interest in the image.
arXiv Detail & Related papers (2022-10-03T13:14:08Z)
Agricultural Plantation Classification using Transfer Learning Approach based on CNN [0.0]
The efficiency of hyper-spectral image recognition has increased significantly with deep learning. CNN and Multi-Layer Perceptron(MLP) has demonstrated to be an excellent process of classifying images. We propose using the method of transfer learning to decrease the training time and reduce the dependence on large labeled data-set.
arXiv Detail & Related papers (2022-06-19T14:43:31Z)
Neural Maximum A Posteriori Estimation on Unpaired Data for Motion Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data. The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z)
Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data. We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z)
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer. Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z)
Convolutional Neural Networks for Multispectral Image Cloud Masking [7.812073412066698]
Convolutional neural networks (CNN) have proven to be state of the art methods for many image classification tasks. We study the use of different CNN architectures for cloud masking of Proba-V multispectral images.
arXiv Detail & Related papers (2020-12-09T21:33:20Z)
Learning CNN filters from user-drawn image markers for coconut-tree image classification [78.42152902652215]
We present a method that needs a minimal set of user-selected images to train the CNN's feature extractor. The method learns the filters of each convolutional layer from user-drawn markers in image regions that discriminate classes. It does not rely on optimization based on backpropagation, and we demonstrate its advantages on the binary classification of coconut-tree aerial images.
arXiv Detail & Related papers (2020-08-08T15:50:23Z)
Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification [53.735029033681435]
Transfer learning is a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains. In this work, we demonstrate that adversarially-trained models transfer better than non-adversarially-trained models.
arXiv Detail & Related papers (2020-07-11T22:48:42Z)
Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation. We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters. As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.