Automatic Generation of Descriptive Titles for Video Clips Using Deep
Learning
- URL: http://arxiv.org/abs/2104.03337v1
- Date: Wed, 7 Apr 2021 18:14:18 GMT
- Title: Automatic Generation of Descriptive Titles for Video Clips Using Deep
Learning
- Authors: Soheyla Amirian, Khaled Rasheed, Thiab R. Taha, Hamid R. Arabnia
- Abstract summary: We are proposing an architecture that utilizes image/video captioning methods and Natural Language Processing systems to generate a title and a concise abstract for a video.
Such a system can potentially be utilized in many application domains, including, the cinema industry, video search engines, security surveillance, video databases/warehouses, data centers, and others.
- Score: 2.724141845301679
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Over the last decade, the use of Deep Learning in many applications produced
results that are comparable to and in some cases surpassing human expert
performance. The application domains include diagnosing diseases, finance,
agriculture, search engines, robot vision, and many others. In this paper, we
are proposing an architecture that utilizes image/video captioning methods and
Natural Language Processing systems to generate a title and a concise abstract
for a video. Such a system can potentially be utilized in many application
domains, including, the cinema industry, video search engines, security
surveillance, video databases/warehouses, data centers, and others. The
proposed system functions and operates as followed: it reads a video;
representative image frames are identified and selected; the image frames are
captioned; NLP is applied to all generated captions together with text
summarization; and finally, a title and an abstract are generated for the
video. All functions are performed automatically. Preliminary results are
provided in this paper using publicly available datasets. This paper is not
concerned about the efficiency of the system at the execution time. We hope to
be able to address execution efficiency issues in our subsequent publications.
Related papers
- Learning text-to-video retrieval from image captioning [59.81537951811595]
We describe a protocol to study text-to-video retrieval training with unlabeled videos.
We assume (i) no access to labels for any videos, and (ii) access to labeled images in the form of text.
We show that automatically labeling video frames with image captioning allows text-to-video retrieval training.
arXiv Detail & Related papers (2024-04-26T15:56:08Z) - OmniVid: A Generative Framework for Universal Video Understanding [133.73878582161387]
We seek to unify the output space of video understanding tasks by using languages as labels and additionally introducing time and box tokens.
This enables us to address various types of video tasks, including classification, captioning, and localization.
We demonstrate such a simple and straightforward idea is quite effective and can achieve state-of-the-art or competitive results.
arXiv Detail & Related papers (2024-03-26T17:59:24Z) - Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition [84.31749632725929]
In this paper, we focus on one critical challenge of the task, namely scene bias, and accordingly contribute a novel scene-aware video-text alignment method.
Our key idea is to distinguish video representations apart from scene-encoded text representations, aiming to learn scene-agnostic video representations for recognizing actions across domains.
arXiv Detail & Related papers (2024-03-03T16:48:16Z) - Video Summarization: Towards Entity-Aware Captions [73.28063602552741]
We propose the task of summarizing news video directly to entity-aware captions.
We show that our approach generalizes to existing news image captions dataset.
arXiv Detail & Related papers (2023-12-01T23:56:00Z) - Contrastive Graph Multimodal Model for Text Classification in Videos [9.218562155255233]
We are the first to address this new task of video text classification by fusing multimodal information.
We tailor a specific module called CorrelationNet to reinforce feature representation by explicitly extracting layout information.
We construct a new well-defined industrial dataset from the news domain, called TI-News, which is dedicated to building and evaluating video text recognition and classification applications.
arXiv Detail & Related papers (2022-06-06T04:06:21Z) - Reading-strategy Inspired Visual Representation Learning for
Text-to-Video Retrieval [41.420760047617506]
Cross-modal representation learning projects both videos and sentences into common spaces for semantic similarity.
Inspired by the reading strategy of humans, we propose a Reading-strategy Inspired Visual Representation Learning (RIVRL) to represent videos.
Our model RIVRL achieves a new state-of-the-art on TGIF and VATEX.
arXiv Detail & Related papers (2022-01-23T03:38:37Z) - An Integrated Approach for Video Captioning and Applications [2.064612766965483]
We design hybrid deep learning architectures to apply in long videos by captioning videos.
We argue that linking images, videos, and natural language offers many practical benefits and immediate practical applications.
arXiv Detail & Related papers (2022-01-23T01:06:00Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Learning Video Representations from Textual Web Supervision [97.78883761035557]
We propose to use text as a method for learning video representations.
We collect 70M video clips shared publicly on the Internet and train a model to pair each video with its associated text.
We find that this approach is an effective method of pre-training video representations.
arXiv Detail & Related papers (2020-07-29T16:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.