Video Content Classification using Deep Learning
- URL: http://arxiv.org/abs/2111.13813v1
- Date: Sat, 27 Nov 2021 04:36:17 GMT
- Title: Video Content Classification using Deep Learning
- Authors: Pradyumn Patil, Vishwajeet Pawar, Yashraj Pawar and Shruti Pisal
- Abstract summary: This paper presents a model that is a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN)
The model can identify the type of video content and classify them into categories such as "Animation, Gaming, natural content, flat content, etc"
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video content classification is an important research content in computer
vision, which is widely used in many fields, such as image and video retrieval,
computer vision. This paper presents a model that is a combination of
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) which
develops, trains, and optimizes a deep learning network that can identify the
type of video content and classify them into categories such as "Animation,
Gaming, natural content, flat content, etc". To enhance the performance of the
model novel keyframe extraction method is included to classify only the
keyframes, thereby reducing the overall processing time without sacrificing any
significant performance.
Related papers
- Study of the effect of Sharpness on Blind Video Quality Assessment [0.0]
This study explores the sharpness effect on models like BVQA.
Sharpness is the measure of the clarity and details of the video image.
This study uses the existing video quality databases such as CVD2014.
arXiv Detail & Related papers (2024-04-06T16:10:48Z) - Deep Neural Networks in Video Human Action Recognition: A Review [21.00217656391331]
Video behavior recognition is one of the most foundational tasks of computer vision.
Deep neural networks are built for recognizing pixel-level information such as images with RGB, RGB-D, or optical flow formats.
In our article, the performance of deep neural networks surpassed most of the techniques in the feature learning and extraction tasks.
arXiv Detail & Related papers (2023-05-25T03:54:41Z) - Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images.
Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content.
This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - InternVideo: General Video Foundation Models via Generative and
Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks.
InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives.
InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - Frozen CLIP Models are Efficient Video Learners [86.73871814176795]
Video recognition has been dominated by the end-to-end learning paradigm.
Recent advances in Contrastive Vision-Language Pre-training pave the way for a new route for visual recognition tasks.
We present Efficient Video Learning -- an efficient framework for directly training high-quality video recognition models.
arXiv Detail & Related papers (2022-08-06T17:38:25Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Video-based Facial Expression Recognition using Graph Convolutional
Networks [57.980827038988735]
We introduce a Graph Convolutional Network (GCN) layer into a common CNN-RNN based model for video-based facial expression recognition.
We evaluate our method on three widely-used datasets, CK+, Oulu-CASIA and MMI, and also one challenging wild dataset AFEW8.0.
arXiv Detail & Related papers (2020-10-26T07:31:51Z) - Video Contents Understanding using Deep Neural Networks [0.0]
We propose a novel application of Transfer Learning to classify video-frame sequences over multiple classes.
This representation is achieved with the advent of "deep neural network" (DNN)
arXiv Detail & Related papers (2020-04-29T05:18:40Z) - Feature Re-Learning with Data Augmentation for Video Relevance
Prediction [35.87597969685573]
Re-learning is realized by projecting a given deep feature into a new space by an affine transformation.
We propose a new data augmentation strategy which works directly on frame-level and video-level features.
arXiv Detail & Related papers (2020-04-08T05:22:41Z) - Learning spatio-temporal representations with temporal squeeze pooling [11.746833714322154]
We propose a new video representation learning method, named Temporal Squeeze (TS) pooling, which can extract the essential movement information from a long sequence of video frames and map it into a set of few images, named Squeezed Images.
The resulting Squeezed Images contain the essential movement information from the video frames, corresponding to the optimization of the video classification task.
We evaluate our architecture on two video classification benchmarks, and the results achieved are compared to the state-of-the-art.
arXiv Detail & Related papers (2020-02-11T21:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.