Enhancing Multimodal Affective Analysis with Learned Live Comment Features
- URL: http://arxiv.org/abs/2410.16407v1
- Date: Mon, 21 Oct 2024 18:19:09 GMT
- Title: Enhancing Multimodal Affective Analysis with Learned Live Comment Features
- Authors: Zhaoyuan Deng, Amith Ananthram, Kathleen McKeown,
- Abstract summary: Live comments, also known as Danmaku, are user-generated messages that are synchronized with video content.
We first construct the Live Comment for Affective Analysis dataset which contains live comments for English and Chinese videos.
We then use contrastive learning to train a video encoder to produce synthetic live comment features for enhanced multimodal affective content analysis.
- Score: 12.437191675553423
- License:
- Abstract: Live comments, also known as Danmaku, are user-generated messages that are synchronized with video content. These comments overlay directly onto streaming videos, capturing viewer emotions and reactions in real-time. While prior work has leveraged live comments in affective analysis, its use has been limited due to the relative rarity of live comments across different video platforms. To address this, we first construct the Live Comment for Affective Analysis (LCAffect) dataset which contains live comments for English and Chinese videos spanning diverse genres that elicit a wide spectrum of emotions. Then, using this dataset, we use contrastive learning to train a video encoder to produce synthetic live comment features for enhanced multimodal affective content analysis. Through comprehensive experimentation on a wide range of affective analysis tasks (sentiment, emotion recognition, and sarcasm detection) in both English and Chinese, we demonstrate that these synthetic live comment features significantly improve performance over state-of-the-art methods.
Related papers
- HOTVCOM: Generating Buzzworthy Comments for Videos [49.39846630199698]
This study introduces textscHotVCom, the largest Chinese video hot-comment dataset, comprising 94k diverse videos and 137 million comments.
We also present the textttComHeat framework, which synergistically integrates visual, auditory, and textual data to generate influential hot-comments on the Chinese video dataset.
arXiv Detail & Related papers (2024-09-23T16:45:13Z) - NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality [52.08735848128973]
We study the capability of Video-Language (VidL) models in understanding compositions between objects, attributes, actions and their relations.
We propose a training method called NAVERO which utilizes video-text data augmented with negative texts to enhance composition understanding.
arXiv Detail & Related papers (2024-08-18T15:27:06Z) - Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting [30.96049241998733]
We propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network to generate diverse video commenting with multiple sentiments and multiple semantics.
Specifically, our sentiment-oriented diversity encoder elegantly combines VAE and random mask mechanism to achieve semantic diversity under sentiment guidance.
A batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance.
arXiv Detail & Related papers (2024-04-19T10:43:25Z) - LiveChat: Video Comment Generation from Audio-Visual Multimodal Contexts [8.070778830276275]
We create a large-scale audio-visual multimodal dialogue dataset to facilitate the development of live commenting technologies.
The data is collected from Twitch, with 11 different categories and 575 streamers for a total of 438 hours of video and 3.2 million comments.
We propose a novel multimodal generation model capable of generating live comments that align with the temporal and spatial events within the video.
arXiv Detail & Related papers (2023-10-01T02:35:58Z) - ViCo: Engaging Video Comment Generation with Human Preference Rewards [68.50351391812723]
We propose ViCo with three novel designs to tackle the challenges for generating engaging Video Comments.
To quantify the engagement of comments, we utilize the number of "likes" each comment receives as a proxy of human preference.
To automatically evaluate the engagement of comments, we train a reward model to align its judgment to the above proxy.
arXiv Detail & Related papers (2023-08-22T04:01:01Z) - A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In
Zero Shot [67.00455874279383]
We propose verbalizing long videos to generate descriptions in natural language, then performing video-understanding tasks on the generated story as opposed to the original video.
Our method, despite being zero-shot, achieves significantly better results than supervised baselines for video understanding.
To alleviate a lack of story understanding benchmarks, we publicly release the first dataset on a crucial task in computational social science on persuasion strategy identification.
arXiv Detail & Related papers (2023-05-16T19:13:11Z) - Knowledge Enhanced Model for Live Video Comment Generation [40.762720398152766]
We propose a knowledge enhanced generation model inspired by the divergent and informative nature of live video comments.
Our model adopts a pre-training encoder-decoder framework and incorporates external knowledge.
The MovieLC dataset and our code will be released.
arXiv Detail & Related papers (2023-04-28T07:03:50Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - LiveSeg: Unsupervised Multimodal Temporal Segmentation of Long
Livestream Videos [82.48910259277984]
Livestream tutorial videos are usually hours long, recorded, and uploaded to the Internet directly after the live sessions, making it hard for other people to catch up quickly.
An outline will be a beneficial solution, which requires the video to be temporally segmented according to topics.
We propose LiveSeg, an unsupervised Livestream video temporal solution, which takes advantage of multimodal features from different domains.
arXiv Detail & Related papers (2022-10-12T00:08:17Z) - Response to LiveBot: Generating Live Video Comments Based on Visual and
Textual Contexts [7.8885775363362]
LiveBot was recently introduced as a novel Automatic Live Video Commenting (ALVC) application.
LiveBot generates live video comments from both the existing video stream and existing viewers comments.
In this paper, we study these discrepancies in detail and propose an alternative baseline implementation.
arXiv Detail & Related papers (2020-06-04T17:16:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.