Deep Architectures for Content Moderation and Movie Content Rating
- URL: http://arxiv.org/abs/2212.04533v2
- Date: Mon, 12 Dec 2022 07:53:17 GMT
- Title: Deep Architectures for Content Moderation and Movie Content Rating
- Authors: Fatih Cagatay Akyon, Alptekin Temizel
- Abstract summary: Movie content rating and TV show rating are the two most common rating systems established by professional committees.
A desirable solution is to use computer vision based video content analysis techniques to automate the evaluation process.
In this paper, related works are summarized for action recognition, multi-modal learning, movie genre classification, and sensitive content detection.
- Score: 3.04585143845864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rating a video based on its content is an important step for classifying
video age categories. Movie content rating and TV show rating are the two most
common rating systems established by professional committees. However, manually
reviewing and evaluating scene/film content by a committee is a tedious work
and it becomes increasingly difficult with the ever-growing amount of online
video content. As such, a desirable solution is to use computer vision based
video content analysis techniques to automate the evaluation process. In this
paper, related works are summarized for action recognition, multi-modal
learning, movie genre classification, and sensitive content detection in the
context of content moderation and movie content rating. The project page is
available at https://github.com/fcakyon/content-moderation-deep-learning.
Related papers
- Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos [0.1399948157377307]
Governments, educators, and parents are often at odds with media platforms about how to regulate, control, and limit the spread of such content.
Techniques from natural language processing and computer vision have been used widely to automatically identify and filter out sensitive content.
More sophisticated algorithms for understanding the context of both text and image may open rooms for improvement in content censorship.
arXiv Detail & Related papers (2024-11-26T05:29:18Z) - EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval [52.375143786641196]
EgoCVR is an evaluation benchmark for fine-grained Composed Video Retrieval.
EgoCVR consists of 2,295 queries that specifically focus on high-quality temporal video understanding.
arXiv Detail & Related papers (2024-07-23T17:19:23Z) - VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It [46.67441830344145]
We focus on the task of automatically evaluating the quality of video course content.
We propose three evaluation principles and design a new evaluation framework, textitVCEval, based on these principles.
Our method effectively distinguishes video courses of different content quality and produces a range of interpretable results.
arXiv Detail & Related papers (2024-06-15T13:18:30Z) - Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement [72.7576395034068]
Video Corpus Moment Retrieval (VCMR) is a new video retrieval task aimed at retrieving a relevant moment from a large corpus of untrimmed videos using a text query.
We argue that effectively capturing the partial relevance between the query and video is essential for the VCMR task.
For video retrieval, we introduce a multi-modal collaborative video retriever, generating different query representations for the two modalities.
For moment localization, we propose the focus-then-fuse moment localizer, utilizing modality-specific gates to capture essential content.
arXiv Detail & Related papers (2024-02-21T07:16:06Z) - The Potential of Vision-Language Models for Content Moderation of
Children's Videos [1.0589208420411014]
This paper presents an in depth analysis of how context-specific language prompts affect content moderation performance.
It is important to include more context in content moderation prompts, particularly for cartoon videos.
arXiv Detail & Related papers (2023-12-06T22:29:16Z) - Extraction and Summarization of Explicit Video Content using Multi-Modal
Deep Learning [0.0]
We propose a novel pipeline that uses multi-modal deep learning to first extract the explicit segments of input videos and then summarize their content using text to determine its age appropriateness and age rating.
We also evaluate our pipeline's effectiveness in the end using standard metrics.
arXiv Detail & Related papers (2023-11-17T22:44:05Z) - TL;DW? Summarizing Instructional Videos with Task Relevance &
Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization.
Existing video summarization datasets rely on manual frame-level annotations.
We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z) - VPN: Video Provenance Network for Robust Content Attribution [72.12494245048504]
We present VPN - a content attribution method for recovering provenance information from videos shared online.
We learn a robust search embedding for matching such video, using full-length or truncated video queries.
Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
arXiv Detail & Related papers (2021-09-21T09:07:05Z) - Condensed Movies: Story Based Retrieval with Contextual Embeddings [83.73479493450009]
We create the Condensed Movies dataset (CMD) consisting of the key scenes from over 3K movies.
The dataset is scalable, obtained automatically from YouTube, and is freely available for anybody to download and use.
We provide a deep network baseline for text-to-video retrieval on our dataset, combining character, speech and visual cues into a single video embedding.
arXiv Detail & Related papers (2020-05-08T17:55:03Z) - Feature Re-Learning with Data Augmentation for Video Relevance
Prediction [35.87597969685573]
Re-learning is realized by projecting a given deep feature into a new space by an affine transformation.
We propose a new data augmentation strategy which works directly on frame-level and video-level features.
arXiv Detail & Related papers (2020-04-08T05:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.