Cloud based Scalable Object Recognition from Video Streams using
Orientation Fusion and Convolutional Neural Networks
- URL: http://arxiv.org/abs/2106.15329v1
- Date: Sat, 19 Jun 2021 07:15:15 GMT
- Title: Cloud based Scalable Object Recognition from Video Streams using
Orientation Fusion and Convolutional Neural Networks
- Authors: Muhammad Usman Yaseen, Ashiq Anjum, Giancarlo Fortino, Antonio Liotta,
Amir Hussain
- Abstract summary: Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition.
CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets.
We propose a new CNN method based on orientation fusion for visual object recognition.
- Score: 11.44782606621054
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Object recognition from live video streams comes with numerous challenges
such as the variation in illumination conditions and poses. Convolutional
neural networks (CNNs) have been widely used to perform intelligent visual
object recognition. Yet, CNNs still suffer from severe accuracy degradation,
particularly on illumination-variant datasets. To address this problem, we
propose a new CNN method based on orientation fusion for visual object
recognition. The proposed cloud-based video analytics system pioneers the use
of bi-dimensional empirical mode decomposition to split a video frame into
intrinsic mode functions (IMFs). We further propose these IMFs to endure Reisz
transform to produce monogenic object components, which are in turn used for
the training of CNNs. Past works have demonstrated how the object orientation
component may be used to pursue accuracy levels as high as 93\%. Herein we
demonstrate how a feature-fusion strategy of the orientation components leads
to further improving visual recognition accuracy to 97\%. We also assess the
scalability of our method, looking at both the number and the size of the video
streams under scrutiny. We carry out extensive experimentation on the publicly
available Yale dataset, including also a self generated video datasets, finding
significant improvements (both in accuracy and scale), in comparison to
AlexNet, LeNet and SE-ResNeXt, which are the three most commonly used deep
learning models for visual object recognition and classification.
Related papers
- Deformable Convolution Based Road Scene Semantic Segmentation of Fisheye Images in Autonomous Driving [4.720434481945155]
This study investigates the effectiveness of modern Deformable Convolutional Neural Networks (DCNNs) for semantic segmentation tasks.
Our experiments focus on segmenting the WoodScape fisheye image dataset into ten distinct classes, assessing the Deformable Networks' ability to capture intricate spatial relationships.
The significant improvement in mIoU score resulting from integrating Deformable CNNs demonstrates their effectiveness in handling the geometric distortions present in fisheye imagery.
arXiv Detail & Related papers (2024-07-23T17:02:24Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - ConViViT -- A Deep Neural Network Combining Convolutions and Factorized
Self-Attention for Human Activity Recognition [3.6321891270689055]
We propose a novel approach that leverages the strengths of both CNNs and Transformers in a hybrid architecture for performing activity recognition using RGB videos.
Our architecture has achieved new SOTA results with 90.05 %, 99.6%, and 95.09% on HMDB51, UCF101, and ETRI-Activity3D respectively.
arXiv Detail & Related papers (2023-10-22T21:13:43Z) - Video Action Recognition Collaborative Learning with Dynamics via
PSO-ConvNet Transformer [1.876462046907555]
We propose a novel PSO-ConvNet model for learning actions in videos.
Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy.
Overall, our dynamic PSO-ConvNet model provides a promising direction for improving Human Action Recognition.
arXiv Detail & Related papers (2023-02-17T23:39:34Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data.
We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D
Object and Scene Recognition [10.796613905980609]
We propose a novel framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks.
To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed.
Experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully.
arXiv Detail & Related papers (2020-04-26T10:58:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.