It is not Sexually Suggestive, It is Educative. Separating Sex Education
from Suggestive Content on TikTok Videos
- URL: http://arxiv.org/abs/2307.03274v1
- Date: Thu, 6 Jul 2023 20:23:17 GMT
- Title: It is not Sexually Suggestive, It is Educative. Separating Sex Education
from Suggestive Content on TikTok Videos
- Authors: Enfa George, Mihai Surdeanu
- Abstract summary: SexTok is a dataset composed of TikTok videos labeled as sexually suggestive (from the annotator's point of view), sex-educational content, or neither.
Children's exposure to sexually suggestive videos has been shown to have adversarial effects on their development.
Virtual sex education, especially on subjects that are more relevant to the LGBTQIA+ community, is very valuable.
- Score: 22.870334358353585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce SexTok, a multi-modal dataset composed of TikTok videos labeled
as sexually suggestive (from the annotator's point of view), sex-educational
content, or neither. Such a dataset is necessary to address the challenge of
distinguishing between sexually suggestive content and virtual sex education
videos on TikTok. Children's exposure to sexually suggestive videos has been
shown to have adversarial effects on their development. Meanwhile, virtual sex
education, especially on subjects that are more relevant to the LGBTQIA+
community, is very valuable. The platform's current system removes or penalizes
some of both types of videos, even though they serve different purposes. Our
dataset contains video URLs, and it is also audio transcribed. To validate its
importance, we explore two transformer-based models for classifying the videos.
Our preliminary results suggest that the task of distinguishing between these
types of videos is learnable but challenging. These experiments suggest that
this dataset is meaningful and invites further study on the subject.
Related papers
- Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation
Protocols [53.706461356853445]
Untrimmed videos have interrelated events, dependencies, context, overlapping events, object-object interactions, domain specificity, and other semantics worth describing.
Video Captioning (DVC) aims at detecting and describing different events in a given video.
arXiv Detail & Related papers (2023-11-05T01:45:31Z) - A deep-learning approach to early identification of suggested sexual
harassment from videos [0.802904964931021]
Sexual harassment, sexual abuse, and sexual violence are prevalent problems in this day and age.
We have classified the three terms (harassment, abuse, and violence) based on the visual attributes present in images depicting these situations.
We identified that factors such as facial expression of the victim and perpetrator and unwanted touching had a direct link to identifying the scenes.
Based on these definitions and characteristics, we have developed a first-of-its-kind dataset from various Indian movie scenes.
arXiv Detail & Related papers (2023-06-01T16:14:17Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - Self-Supervised Learning for Videos: A Survey [70.37277191524755]
Self-supervised learning has shown promise in both image and video domains.
In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain.
arXiv Detail & Related papers (2022-06-18T00:26:52Z) - Boosting Video Representation Learning with Multi-Faceted Integration [112.66127428372089]
Video content is multifaceted, consisting of objects, scenes, interactions or actions.
Existing datasets mostly label only one of the facets for model training, resulting in the video representation that biases to only one facet depending on the training dataset.
We propose a new learning framework, MUlti-Faceted Integration (MUFI), to aggregate facets from different datasets for learning a representation that could reflect the full spectrum of video content.
arXiv Detail & Related papers (2022-01-11T16:14:23Z) - Characterizing Abhorrent, Misinformative, and Mistargeted Content on
YouTube [1.9138099871648453]
We study the degree of problematic content on YouTube and the role of the recommendation algorithm in the dissemination of such content.
Our analysis reveals that young children are likely to encounter disturbing content when they randomly browse the platform.
We find that Incel activity is increasing over time and that platforms may play an active role in steering users towards extreme content.
arXiv Detail & Related papers (2021-05-20T15:10:48Z) - Watch and Learn: Mapping Language and Noisy Real-world Videos with
Self-supervision [54.73758942064708]
We teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
For training and evaluation, we contribute a new dataset ApartmenTour' that contains a large number of online videos and subtitles.
arXiv Detail & Related papers (2020-11-19T03:43:56Z) - Classification of Important Segments in Educational Videos using
Multimodal Features [10.175871202841346]
We propose a multimodal neural architecture that utilizes state-of-the-art audio, visual and textual features.
Our experiments investigate the impact of visual and temporal information, as well as the combination of multimodal features on importance prediction.
arXiv Detail & Related papers (2020-10-26T14:40:23Z) - Beyond Instructional Videos: Probing for More Diverse Visual-Textual
Grounding on YouTube [35.32213834577941]
We show that visual-textual grounding is possible across previously unexplored video categories.
We find that pretraining on a more diverse set results in representations that generalize to both non-instructional and instructional domains.
arXiv Detail & Related papers (2020-04-29T17:10:10Z) - VIOLIN: A Large-Scale Dataset for Video-and-Language Inference [103.7457132841367]
We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.
Given a video clip with subtitles aligned as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip.
A new large-scale dataset, named Violin (VIdeO-and-Language INference), is introduced for this task, which consists of 95,322 video-hypothesis pairs from 15,887 video clips.
arXiv Detail & Related papers (2020-03-25T20:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.