Nepali Video Captioning using CNN-RNN Architecture
- URL: http://arxiv.org/abs/2311.02699v1
- Date: Sun, 5 Nov 2023 16:09:40 GMT
- Title: Nepali Video Captioning using CNN-RNN Architecture
- Authors: Bipesh Subedi, Saugat Singh, Bal Krishna Bal
- Abstract summary: This article presents a study on Nepali video captioning using deep neural networks.
Through the integration of pre-trained CNNs and RNNs, the research focuses on generating precise and contextually relevant captions for Nepali videos.
The approach involves dataset collection, data preprocessing, model implementation, and evaluation.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This article presents a study on Nepali video captioning using deep neural
networks. Through the integration of pre-trained CNNs and RNNs, the research
focuses on generating precise and contextually relevant captions for Nepali
videos. The approach involves dataset collection, data preprocessing, model
implementation, and evaluation. By enriching the MSVD dataset with Nepali
captions via Google Translate, the study trains various CNN-RNN architectures.
The research explores the effectiveness of CNNs (e.g., EfficientNetB0,
ResNet101, VGG16) paired with different RNN decoders like LSTM, GRU, and
BiLSTM. Evaluation involves BLEU and METEOR metrics, with the best model being
EfficientNetB0 + BiLSTM with 1024 hidden dimensions, achieving a BLEU-4 score
of 17 and METEOR score of 46. The article also outlines challenges and future
directions for advancing Nepali video captioning, offering a crucial resource
for further research in this area.
Related papers
- Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet [0.0]
This paper presents an end-to-end deep learning model for Automatic Speech Recognition (ASR) that transcribes Nepali speech to text.
The model was trained and tested on the OpenSLR (audio, text) dataset.
The character error rate (CER) of 17.06 percent has been achieved.
arXiv Detail & Related papers (2024-06-25T12:14:01Z) - CNN2GNN: How to Bridge CNN with GNN [59.42117676779735]
We propose a novel CNN2GNN framework to unify CNN and GNN together via distillation.
The performance of distilled boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
arXiv Detail & Related papers (2024-04-23T08:19:08Z) - Convolutional Neural Networks for Sentiment Analysis on Weibo Data: A
Natural Language Processing Approach [0.228438857884398]
This study addresses the complex task of sentiment analysis on a dataset of 119,988 original tweets from Weibo using a Convolutional Neural Network (CNN)
A CNN-based model was utilized, leveraging word embeddings for feature extraction, and trained to perform sentiment classification.
The model achieved a macro-average F1-score of approximately 0.73 on the test set, showing balanced performance across positive, neutral, and negative sentiments.
arXiv Detail & Related papers (2023-07-13T03:02:56Z) - Testing the Channels of Convolutional Neural Networks [8.927538538637783]
We propose techniques for testing the channels of convolutional neural networks (CNNs)
We design FtGAN, an extension to GAN, that can generate test data with varying the intensity of a channel of a target CNN.
We also proposed a channel selection algorithm to find representative channels for testing.
arXiv Detail & Related papers (2023-03-06T09:58:39Z) - N-Omniglot: a Large-scale Neuromorphic Dataset for Spatio-Temporal
Sparse Few-shot Learning [10.812738608234321]
We provide the first neuromorphic dataset: N- Omniglot, using the Dynamic Vision Sensor (DVS)
It contains 1623 categories of handwritten characters, with only 20 samples per class.
The dataset provides a powerful challenge and a suitable benchmark for developing SNNs algorithm in the few-shot learning domain.
arXiv Detail & Related papers (2021-12-25T12:41:34Z) - GNN-LM: Language Modeling based on Global Contexts via GNN [32.52117529283929]
We introduce GNN-LM, which extends the vanilla neural language model (LM) by allowing to reference similar contexts in the entire training corpus.
GNN-LM achieves a new state-of-the-art perplexity of 14.8 on WikiText-103.
arXiv Detail & Related papers (2021-10-17T07:18:21Z) - Training Graph Neural Networks with 1000 Layers [133.84813995275988]
We study reversible connections, group convolutions, weight tying, and equilibrium models to advance the memory and parameter efficiency of GNNs.
To the best of our knowledge, RevGNN-Deep is the deepest GNN in the literature by one order of magnitude.
arXiv Detail & Related papers (2021-06-14T15:03:00Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement [53.47564132861866]
We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size.
CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
arXiv Detail & Related papers (2020-07-25T22:21:05Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.