Intelligent 3D Network Protocol for Multimedia Data Classification using
Deep Learning
- URL: http://arxiv.org/abs/2207.11504v1
- Date: Sat, 23 Jul 2022 12:24:52 GMT
- Title: Intelligent 3D Network Protocol for Multimedia Data Classification using
Deep Learning
- Authors: Arslan Syed, Eman A. Aldhahri, Muhammad Munawar Iqbal, Abid Ali, Ammar
Muthanna, Harun Jamil, and Faisal Jamil
- Abstract summary: We implement Hybrid Deep Learning Architecture that combines STIP and 3D CNN features to enhance the performance of 3D videos effectively.
The results are compared with state-of-the-art frameworks from literature for action recognition on UCF101 with an accuracy of 95%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In videos, the human's actions are of three-dimensional (3D) signals. These
videos investigate the spatiotemporal knowledge of human behavior. The
promising ability is investigated using 3D convolution neural networks (CNNs).
The 3D CNNs have not yet achieved high output for their well-established
two-dimensional (2D) equivalents in still photographs. Board 3D Convolutional
Memory and Spatiotemporal fusion face training difficulty preventing 3D CNN
from accomplishing remarkable evaluation. In this paper, we implement Hybrid
Deep Learning Architecture that combines STIP and 3D CNN features to enhance
the performance of 3D videos effectively. After implementation, the more
detailed and deeper charting for training in each circle of space-time fusion.
The training model further enhances the results after handling complicated
evaluations of models. The video classification model is used in this
implemented model. Intelligent 3D Network Protocol for Multimedia Data
Classification using Deep Learning is introduced to further understand
spacetime association in human endeavors. In the implementation of the result,
the well-known dataset, i.e., UCF101 to, evaluates the performance of the
proposed hybrid technique. The results beat the proposed hybrid technique that
substantially beats the initial 3D CNNs. The results are compared with
state-of-the-art frameworks from literature for action recognition on UCF101
with an accuracy of 95%.
Related papers
- Hybrid CNN Bi-LSTM neural network for Hyperspectral image classification [1.2691047660244332]
This paper proposes a neural network combining 3-D CNN, 2-D CNN and Bi-LSTM.
It could achieve 99.83, 99.98 and 100 percent accuracy using only 30 percent trainable parameters of the state-of-art model in IP, PU and SA datasets respectively.
arXiv Detail & Related papers (2024-02-15T15:46:13Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video
Recognition [25.364148451584356]
3D convolution neural networks (CNNs) have been the prevailing option for video recognition.
We propose to automatically design efficient 3D CNN architectures via a novel training-free neural architecture search approach.
Experiments on Something-Something V1&V2 and Kinetics400 demonstrate that the E3D family achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-03-05T15:11:53Z) - ULIP: Learning a Unified Representation of Language, Images, and Point
Clouds for 3D Understanding [110.07170245531464]
Current 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories.
Recent advances have shown that similar problems can be significantly alleviated by employing knowledge from other modalities, such as language.
We learn a unified representation of images, texts, and 3D point clouds by pre-training with object triplets from the three modalities.
arXiv Detail & Related papers (2022-12-10T01:34:47Z) - Hyperspectral Image Classification: Artifacts of Dimension Reduction on
Hybrid CNN [1.2875323263074796]
2D and 3D CNN models have proved highly efficient in exploiting the spatial and spectral information of Hyperspectral Images.
This work proposed a lightweight CNN (3D followed by 2D-CNN) model which significantly reduces the computational cost.
arXiv Detail & Related papers (2021-01-25T18:43:57Z) - 3D CNNs with Adaptive Temporal Feature Resolutions [83.43776851586351]
Similarity Guided Sampling (SGS) module can be plugged into any existing 3D CNN architecture.
SGS empowers 3D CNNs by learning the similarity of temporal features and grouping similar features together.
Our evaluations show that the proposed module improves the state-of-the-art by reducing the computational cost (GFLOPs) by half while preserving or even improving the accuracy.
arXiv Detail & Related papers (2020-11-17T14:34:05Z) - A Real-time Action Representation with Temporal Encoding and Deep
Compression [115.3739774920845]
We propose a new real-time convolutional architecture, called Temporal Convolutional 3D Network (T-C3D), for action representation.
T-C3D learns video action representations in a hierarchical multi-granularity manner while obtaining a high process speed.
Our method achieves clear improvements on UCF101 action recognition benchmark against state-of-the-art real-time methods by 5.4% in terms of accuracy and 2 times faster in terms of inference speed with a less than 5MB storage model.
arXiv Detail & Related papers (2020-06-17T06:30:43Z) - V4D:4D Convolutional Neural Networks for Video-level Representation
Learning [58.548331848942865]
Most 3D CNNs for video representation learning are clip-based, and thus do not consider video-temporal evolution of features.
We propose Video-level 4D Conal Neural Networks, or V4D, to model long-range representation with 4D convolutions.
V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.
arXiv Detail & Related papers (2020-02-18T09:27:41Z) - 2.75D: Boosting learning by representing 3D Medical imaging to 2D
features for small data [54.223614679807994]
3D convolutional neural networks (CNNs) have started to show superior performance to 2D CNNs in numerous deep learning tasks.
Applying transfer learning on 3D CNN is challenging due to a lack of publicly available pre-trained 3D models.
In this work, we proposed a novel 2D strategical representation of volumetric data, namely 2.75D.
As a result, 2D CNN networks can also be used to learn volumetric information.
arXiv Detail & Related papers (2020-02-11T08:24:19Z) - An Information-rich Sampling Technique over Spatio-Temporal CNN for
Classification of Human Actions in Videos [5.414308305392762]
We propose a novel scheme for human action recognition in videos, using a 3-dimensional Convolutional Neural Network (3D CNN) based classifier.
In this paper, a 3D CNN architecture is proposed to extract featuresweighted and follows Long Short-Term Memory (LSTM) to recognize human actions.
Experiments are performed with KTH and WEIZMANN human actions datasets, whereby it is shown to produce comparable results with the state-of-the-art techniques.
arXiv Detail & Related papers (2020-02-06T05:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.