Related papers: Extraction and Summarization of Explicit Video Content using Multi-Modal Deep Learning

Extraction and Summarization of Explicit Video Content using Multi-Modal Deep Learning

URL: http://arxiv.org/abs/2311.10899v2
Date: Tue, 21 Nov 2023 02:16:27 GMT
Title: Extraction and Summarization of Explicit Video Content using Multi-Modal Deep Learning
Authors: Shaunak Joshi, Raghav Gaggar
Abstract summary: We propose a novel pipeline that uses multi-modal deep learning to first extract the explicit segments of input videos and then summarize their content using text to determine its age appropriateness and age rating. We also evaluate our pipeline's effectiveness in the end using standard metrics.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the increase in video-sharing platforms across the internet, it is difficult for humans to moderate the data for explicit content. Hence, an automated pipeline to scan through video data for explicit content has become the need of the hour. We propose a novel pipeline that uses multi-modal deep learning to first extract the explicit segments of input videos and then summarize their content using text to determine its age appropriateness and age rating. We also evaluate our pipeline's effectiveness in the end using standard metrics.

Related papers

Unsupervised Transcript-assisted Video Summarization and Highlight Detection [6.80224810039938]
We propose a multimodal pipeline that leverages video frames and their corresponding transcripts to generate a more condensed version of the video.<n>The pipeline is trained within an RL framework, which rewards the model for generating diverse and representative summaries.<n>Our experiments show that using the transcript in video summarization and highlight detection achieves superior results compared to relying solely on the visual content of the video.
arXiv Detail & Related papers (2025-05-29T09:16:19Z)
Parameter-free Video Segmentation for Vision and Language Understanding [55.20132267309382]
We propose an algorithm for segmenting videos into contiguous chunks, based on the minimum description length principle. The algorithm is entirely parameter-free, given feature vectors, not requiring a set threshold or the number or size of chunks to be specified.
arXiv Detail & Related papers (2025-03-03T05:54:37Z)
Personalized Video Summarization by Multimodal Video Understanding [2.1372652192505703]
We present a pipeline called Video Summarization with Language (VSL) for user-preferred video summarization. VSL is based on pre-trained visual language models (VLMs) to avoid the need to train a video summarization system on a large training dataset. We show that our method is more adaptable across different datasets compared to supervised query-based video summarization models.
arXiv Detail & Related papers (2024-11-05T22:14:35Z)
Delving Deep into Engagement Prediction of Short Videos [34.38399476375175]
This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions. We introduce a substantial dataset comprising 90,000 real-world short videos from Snapchat. Our method demonstrates its ability to predict engagements of short videos purely from video content.
arXiv Detail & Related papers (2024-09-30T23:57:07Z)
Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset. We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them. Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z)
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding [15.697251303126874]
Deep Video Understanding (DVU) Challenge aims to push the boundaries of multimodal extraction, fusion, and analytics. This paper introduces a query-aware method for long video localization and relation discrimination, leveraging an imagelanguage pretrained model. Our approach achieved first and fourth positions for two groups of movie-level queries.
arXiv Detail & Related papers (2023-10-19T13:26:02Z)
VideoXum: Cross-modal Visual and Textural Summarization of Videos [54.0985975755278]
We propose a new joint video and text summarization task. The goal is to generate both a shortened video clip along with the corresponding textual summary from a long video. The generated shortened video clip and text narratives should be semantically well aligned.
arXiv Detail & Related papers (2023-03-21T17:51:23Z)
Hierarchical3D Adapters for Long Video-to-text Summarization [79.01926022762093]
multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods. Our experiments demonstrate that multimodal information offers superior performance over more memory-heavy and fully fine-tuned textual summarization methods.
arXiv Detail & Related papers (2022-10-10T16:44:36Z)
TL;DW? Summarizing Instructional Videos with Task Relevance & Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization. Existing video summarization datasets rely on manual frame-level annotations. We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z)
Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos. We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z)
VPN: Video Provenance Network for Robust Content Attribution [72.12494245048504]
We present VPN - a content attribution method for recovering provenance information from videos shared online. We learn a robust search embedding for matching such video, using full-length or truncated video queries. Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
arXiv Detail & Related papers (2021-09-21T09:07:05Z)
Watch and Learn: Mapping Language and Noisy Real-world Videos with Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations. For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.