Related papers: Cloud based Scalable Object Recognition from Video Streams using Orientation Fusion and Convolutional Neural Networks

Cloud based Scalable Object Recognition from Video Streams using Orientation Fusion and Convolutional Neural Networks

URL: http://arxiv.org/abs/2106.15329v1
Date: Sat, 19 Jun 2021 07:15:15 GMT
Title: Cloud based Scalable Object Recognition from Video Streams using Orientation Fusion and Convolutional Neural Networks
Authors: Muhammad Usman Yaseen, Ashiq Anjum, Giancarlo Fortino, Antonio Liotta, Amir Hussain
Abstract summary: Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition. CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets. We propose a new CNN method based on orientation fusion for visual object recognition.
Score: 11.44782606621054
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Object recognition from live video streams comes with numerous challenges such as the variation in illumination conditions and poses. Convolutional neural networks (CNNs) have been widely used to perform intelligent visual object recognition. Yet, CNNs still suffer from severe accuracy degradation, particularly on illumination-variant datasets. To address this problem, we propose a new CNN method based on orientation fusion for visual object recognition. The proposed cloud-based video analytics system pioneers the use of bi-dimensional empirical mode decomposition to split a video frame into intrinsic mode functions (IMFs). We further propose these IMFs to endure Reisz transform to produce monogenic object components, which are in turn used for the training of CNNs. Past works have demonstrated how the object orientation component may be used to pursue accuracy levels as high as 93\%. Herein we demonstrate how a feature-fusion strategy of the orientation components leads to further improving visual recognition accuracy to 97\%. We also assess the scalability of our method, looking at both the number and the size of the video streams under scrutiny. We carry out extensive experimentation on the publicly available Yale dataset, including also a self generated video datasets, finding significant improvements (both in accuracy and scale), in comparison to AlexNet, LeNet and SE-ResNeXt, which are the three most commonly used deep learning models for visual object recognition and classification.

Related papers

A Framework Combining 3D CNN and Transformer for Video-Based Behavior Recognition [0.0]
We propose a hybrid framework combining 3D CNN and Transformer architectures.<n>The 3D CNN module extracts low-leveltemporal features, while the Transformer module captures long-range temporal dependencies.<n>The proposed model outperforms traditional 3D CNN and standalone Transformers, achieving higher recognition accuracy with manageable complexity.
arXiv Detail & Related papers (2025-08-02T07:33:29Z)
From CNN to CNN + RNN: Adapting Visualization Techniques for Time-Series Anomaly Detection [0.0]
Deep neural networks are highly effective in solving complex problems but are often viewed as "black boxes" This article highlights the difficulties in visually interpreting video-based models and demonstrates techniques for static images can be adapted to recurrent architectures.
arXiv Detail & Related papers (2024-11-07T13:45:23Z)
Deformable Convolution Based Road Scene Semantic Segmentation of Fisheye Images in Autonomous Driving [4.720434481945155]
This study investigates the effectiveness of modern Deformable Convolutional Neural Networks (DCNNs) for semantic segmentation tasks. Our experiments focus on segmenting the WoodScape fisheye image dataset into ten distinct classes, assessing the Deformable Networks' ability to capture intricate spatial relationships. The significant improvement in mIoU score resulting from integrating Deformable CNNs demonstrates their effectiveness in handling the geometric distortions present in fisheye imagery.
arXiv Detail & Related papers (2024-07-23T17:02:24Z)
SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method. We distribute features of space-time tubes evenly across a limited number of learnable clusters. Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z)
ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition [3.6321891270689055]
We propose a novel approach that leverages the strengths of both CNNs and Transformers in a hybrid architecture for performing activity recognition using RGB videos. Our architecture has achieved new SOTA results with 90.05 %, 99.6%, and 95.09% on HMDB51, UCF101, and ETRI-Activity3D respectively.
arXiv Detail & Related papers (2023-10-22T21:13:43Z)
Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer [1.876462046907555]
We propose a novel PSO-ConvNet model for learning actions in videos. Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy. Overall, our dynamic PSO-ConvNet model provides a promising direction for improving Human Action Recognition.
arXiv Detail & Related papers (2023-02-17T23:39:34Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)
Video Salient Object Detection via Contrastive Features and Attention Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection. A co-attention formulation is utilized to combine the low-level and high-level features. We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z)
Recognizing Actions in Videos from Unseen Viewpoints [80.6338404141284]
We show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in training data. We introduce a new dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
arXiv Detail & Related papers (2021-03-30T17:17:54Z)
The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer. Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z)
Video-based Facial Expression Recognition using Graph Convolutional Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition. We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z)
When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition [10.796613905980609]
We propose a novel framework that extracts discriminative feature representations from multi-modal RGB-D images for object and scene recognition tasks. To cope with the high dimensionality of CNN activations, a random weighted pooling scheme has been proposed. Experiments verify that fully randomized structure in RNN stage encodes CNN activations to discriminative solid features successfully.
arXiv Detail & Related papers (2020-04-26T10:58:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.