A Multimodal Sentiment Dataset for Video Recommendation
- URL: http://arxiv.org/abs/2109.08333v1
- Date: Fri, 17 Sep 2021 03:10:42 GMT
- Title: A Multimodal Sentiment Dataset for Video Recommendation
- Authors: Hongxuan Tang, Hao Liu, Xinyan Xiao, Hua Wu
- Abstract summary: We propose a multimodal sentiment analysis dataset, named baiDu Video Sentiment dataset (DuVideoSenti)
DuVideoSenti consists of 5,630 videos which displayed on Baidu.
Each video is manually annotated with a sentimental style label which describes the user's real feeling of a video.
- Score: 21.44500591776281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, multimodal sentiment analysis has seen remarkable advance and a lot
of datasets are proposed for its development. In general, current multimodal
sentiment analysis datasets usually follow the traditional system of
sentiment/emotion, such as positive, negative and so on. However, when applied
in the scenario of video recommendation, the traditional sentiment/emotion
system is hard to be leveraged to represent different contents of videos in the
perspective of visual senses and language understanding. Based on this, we
propose a multimodal sentiment analysis dataset, named baiDu Video Sentiment
dataset (DuVideoSenti), and introduce a new sentiment system which is designed
to describe the sentimental style of a video on recommendation scenery.
Specifically, DuVideoSenti consists of 5,630 videos which displayed on Baidu,
each video is manually annotated with a sentimental style label which describes
the user's real feeling of a video. Furthermore, we propose UNIMO as our
baseline for DuVideoSenti. Experimental results show that DuVideoSenti brings
new challenges to multimodal sentiment analysis, and could be used as a new
benchmark for evaluating approaches designed for video understanding and
multimodal fusion. We also expect our proposed DuVideoSenti could further
improve the development of multimodal sentiment analysis and its application to
video recommendations.
Related papers
- Personalized Video Summarization by Multimodal Video Understanding [2.1372652192505703]
We present a pipeline called Video Summarization with Language (VSL) for user-preferred video summarization.
VSL is based on pre-trained visual language models (VLMs) to avoid the need to train a video summarization system on a large training dataset.
We show that our method is more adaptable across different datasets compared to supervised query-based video summarization models.
arXiv Detail & Related papers (2024-11-05T22:14:35Z) - Infer Induced Sentiment of Comment Response to Video: A New Task, Dataset and Baseline [30.379212611361893]
Existing video multi-modal sentiment analysis mainly focuses on the sentiment expression of people within the video, yet often neglects the induced sentiment of viewers while watching the videos.
We propose Multi-modal Sentiment Analysis for Comment Response of Video Induced (MSA-CRVI) to inferring opinions and emotions according to the comments response to micro video.
It is the largest video multi-modal sentiment dataset in terms of scale and video duration to our knowledge, containing 107,267 comments and 8,210 micro videos with a video duration of 68.83 hours.
arXiv Detail & Related papers (2024-05-15T10:24:54Z) - Detours for Navigating Instructional Videos [58.1645668396789]
We propose VidDetours, a video-language approach that learns to retrieve the targeted temporal segments from a large repository of how-to's.
We show our model's significant improvements over best available methods for video retrieval and question answering, with recall rates exceeding the state of the art by 35%.
arXiv Detail & Related papers (2024-01-03T16:38:56Z) - Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain
Adaptation [74.51546366251753]
Video topic segmentation unveils the coarse-grained semantic structure underlying videos.
We introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames.
Our proposed solution significantly surpasses baseline methods in terms of both accuracy and transferability.
arXiv Detail & Related papers (2023-11-30T21:59:05Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - ChatVideo: A Tracklet-centric Multimodal and Versatile Video
Understanding System [119.51012668709502]
We present our vision for multimodal and versatile video understanding and propose a prototype system, system.
Our system is built upon a tracklet-centric paradigm, which treats tracklets as the basic video unit.
All the detected tracklets are stored in a database and interact with the user through a database manager.
arXiv Detail & Related papers (2023-04-27T17:59:58Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social
Media Short Videos [72.69052180249598]
We present 3MASSIV, a multilingual, multimodal and multi-aspect, expertly-annotated dataset of diverse short videos extracted from short-video social media platform - Moj.
3MASSIV comprises of 50k short videos (20 seconds average duration) and 100K unlabeled videos in 11 different languages.
We show how the social media content in 3MASSIV is dynamic and temporal in nature, which can be used for semantic understanding tasks and cross-lingual analysis.
arXiv Detail & Related papers (2022-03-28T02:47:01Z) - Highlight Timestamp Detection Model for Comedy Videos via Multimodal
Sentiment Analysis [1.6181085766811525]
We propose a multimodal structure to obtain state-of-the-art performance in this field.
We select several benchmarks for multimodal video understanding and apply the most suitable model to find the best performance.
arXiv Detail & Related papers (2021-05-28T08:39:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.