Extraction and Summarization of Explicit Video Content using Multi-Modal
Deep Learning
- URL: http://arxiv.org/abs/2311.10899v2
- Date: Tue, 21 Nov 2023 02:16:27 GMT
- Title: Extraction and Summarization of Explicit Video Content using Multi-Modal
Deep Learning
- Authors: Shaunak Joshi, Raghav Gaggar
- Abstract summary: We propose a novel pipeline that uses multi-modal deep learning to first extract the explicit segments of input videos and then summarize their content using text to determine its age appropriateness and age rating.
We also evaluate our pipeline's effectiveness in the end using standard metrics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increase in video-sharing platforms across the internet, it is
difficult for humans to moderate the data for explicit content. Hence, an
automated pipeline to scan through video data for explicit content has become
the need of the hour. We propose a novel pipeline that uses multi-modal deep
learning to first extract the explicit segments of input videos and then
summarize their content using text to determine its age appropriateness and age
rating. We also evaluate our pipeline's effectiveness in the end using standard
metrics.
Related papers
- Personalized Video Summarization by Multimodal Video Understanding [2.1372652192505703]
We present a pipeline called Video Summarization with Language (VSL) for user-preferred video summarization.
VSL is based on pre-trained visual language models (VLMs) to avoid the need to train a video summarization system on a large training dataset.
We show that our method is more adaptable across different datasets compared to supervised query-based video summarization models.
arXiv Detail & Related papers (2024-11-05T22:14:35Z) - Delving Deep into Engagement Prediction of Short Videos [34.38399476375175]
This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions.
We introduce a substantial dataset comprising 90,000 real-world short videos from Snapchat.
Our method demonstrates its ability to predict engagements of short videos purely from video content.
arXiv Detail & Related papers (2024-09-30T23:57:07Z) - Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - Query-aware Long Video Localization and Relation Discrimination for Deep
Video Understanding [15.697251303126874]
Deep Video Understanding (DVU) Challenge aims to push the boundaries of multimodal extraction, fusion, and analytics.
This paper introduces a query-aware method for long video localization and relation discrimination, leveraging an imagelanguage pretrained model.
Our approach achieved first and fourth positions for two groups of movie-level queries.
arXiv Detail & Related papers (2023-10-19T13:26:02Z) - VideoXum: Cross-modal Visual and Textural Summarization of Videos [54.0985975755278]
We propose a new joint video and text summarization task.
The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video.
The generated shortened video clip and text narratives should be semantically well aligned.
arXiv Detail & Related papers (2023-03-21T17:51:23Z) - Hierarchical3D Adapters for Long Video-to-text Summarization [79.01926022762093]
multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.
Our experiments demonstrate that multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.
arXiv Detail & Related papers (2022-10-10T16:44:36Z) - TL;DW? Summarizing Instructional Videos with Task Relevance &
Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization.
Existing video summarization datasets rely on manual frame-level annotations.
We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z) - Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z) - VPN: Video Provenance Network for Robust Content Attribution [72.12494245048504]
We present VPN - a content attribution method for recovering provenance information from videos shared online.
We learn a robust search embedding for matching such video, using full-length or truncated video queries.
Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
arXiv Detail & Related papers (2021-09-21T09:07:05Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.