Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
- URL: http://arxiv.org/abs/2407.06115v1
- Date: Wed, 15 May 2024 10:24:54 GMT
- Title: Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline
- Authors: Qi Jia, Baoyu Fan, Cong Xu, Lu Liu, Liang Jin, Guoguang Du, Zhenhua Guo, Yaqian Zhao, Xuanjing Huang, Rengang Li,
- Abstract summary: Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos.
We propose Multi-modal Sentiment Analysis for Comment Response of Video Induced (MSA-CRVI) to inferring opinions and emotions according to the comments response to micro video.
It is the largest video multi-modal sentiment dataset in terms of scale and video duration to our knowledge, containing 107,267 comments and 8,210 micro videos with a video duration of 68.83 hours.
- Score: 30.379212611361893
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos. Induced sentiment of viewers is essential for inferring the public response to videos, has broad application in analyzing public societal sentiment, effectiveness of advertising and other areas. The micro videos and the related comments provide a rich application scenario for viewers induced sentiment analysis. In light of this, we introduces a novel research task, Multi-modal Sentiment Analysis for Comment Response of Video Induced(MSA-CRVI), aims to inferring opinions and emotions according to the comments response to micro video. Meanwhile, we manually annotate a dataset named Comment Sentiment toward to Micro Video (CSMV) to support this research. It is the largest video multi-modal sentiment dataset in terms of scale and video duration to our knowledge, containing 107,267 comments and 8,210 micro videos with a video duration of 68.83 hours. To infer the induced sentiment of comment should leverage the video content, so we propose the Video Content-aware Comment Sentiment Analysis (VC-CSA) method as baseline to address the challenges inherent in this new task. Extensive experiments demonstrate that our method is showing significant improvements over other established baselines.
Related papers
- HOTVCOM: Generating Buzzworthy Comments for Videos [49.39846630199698]
This study introduces textscHotVCom, the largest Chinese video hot-comment dataset, comprising 94k diverse videos and 137 million comments.
We also present the textttComHeat framework, which synergistically integrates visual, auditory, and textual data to generate influential hot-comments on the Chinese video dataset.
arXiv Detail & Related papers (2024-09-23T16:45:13Z) - YTCommentQA: Video Question Answerability in Instructional Videos [22.673000779017595]
We present the YTCommentQA dataset, which contains naturally-generated questions from YouTube.
The dataset is categorized by their answerability and required modality to answer -- visual, script, or both.
arXiv Detail & Related papers (2024-01-30T14:18:37Z) - Video Summarization Overview [25.465707307283434]
Video summarization facilitates quickly grasping video content by creating a compact summary of videos.
This survey covers early studies as well as recent approaches which take advantage of deep learning techniques.
arXiv Detail & Related papers (2022-10-21T03:29:31Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - TL;DW? Summarizing Instructional Videos with Task Relevance &
Cross-Modal Saliency [133.75876535332003]
We focus on summarizing instructional videos, an under-explored area of video summarization.
Existing video summarization datasets rely on manual frame-level annotations.
We propose an instructional video summarization network that combines a context-aware temporal video encoder and a segment scoring transformer.
arXiv Detail & Related papers (2022-08-14T04:07:40Z) - Video Question Answering with Iterative Video-Text Co-Tokenization [77.66445727743508]
We propose a novel multi-stream video encoder for video question answering.
We experimentally evaluate the model on several datasets, such as MSRVTT-QA, MSVD-QA, IVQA.
Our model reduces the required GFLOPs from 150-360 to only 67, producing a highly efficient video question answering model.
arXiv Detail & Related papers (2022-08-01T15:35:38Z) - NEWSKVQA: Knowledge-Aware News Video Question Answering [5.720640816755851]
We explore a new frontier in video question answering: answering knowledge-based questions in the context of news videos.
We curate a new dataset of 12K news videos spanning across 156 hours with 1M multiple-choice question-answer pairs covering 8263 unique entities.
We propose a novel approach, NEWSKVQA which performs multi-modal inferencing over textual multiple-choice questions, videos, their transcripts and knowledge base.
arXiv Detail & Related papers (2022-02-08T17:31:31Z) - A Multimodal Sentiment Dataset for Video Recommendation [21.44500591776281]
We propose a multimodal sentiment analysis dataset, named baiDu Video Sentiment dataset (DuVideoSenti)
DuVideoSenti consists of 5,630 videos which displayed on Baidu.
Each video is manually annotated with a sentimental style label which describes the user's real feeling of a video.
arXiv Detail & Related papers (2021-09-17T03:10:42Z) - The Potential of Using Vision Videos for CrowdRE: Video Comments as a
Source of Feedback [0.8594140167290097]
We analyze and assess the potential of using vision videos for CrowdRE.
In a case study, we analyzed 4505 comments on a vision video from YouTube.
We conclude that the use of vision videos for CrowdRE has a large potential.
arXiv Detail & Related papers (2021-08-04T14:18:27Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - VIOLIN: A Large-Scale Dataset for Video-and-Language Inference [103.7457132841367]
We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.
Given a video clip with subtitles aligned as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip.
A new large-scale dataset, named Violin (VIdeO-and-Language INference), is introduced for this task, which consists of 95,322 video-hypothesis pairs from 15,887 video clips.
arXiv Detail & Related papers (2020-03-25T20:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.