Knowledge Enhanced Model for Live Video Comment Generation
- URL: http://arxiv.org/abs/2304.14657v1
- Date: Fri, 28 Apr 2023 07:03:50 GMT
- Title: Knowledge Enhanced Model for Live Video Comment Generation
- Authors: Jieting Chen, Junkai Ding, Wenping Chen, Qin Jin
- Abstract summary: We propose a knowledge enhanced generation model inspired by the divergent and informative nature of live video comments.
Our model adopts a pre-training encoder-decoder framework and incorporates external knowledge.
The MovieLC dataset and our code will be released.
- Score: 40.762720398152766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Live video commenting is popular on video media platforms, as it can create a
chatting atmosphere and provide supplementary information for users while
watching videos. Automatically generating live video comments can improve user
experience and enable human-like generation for bot chatting. Existing works
mostly focus on short video datasets while ignoring other important video types
such as long videos like movies. In this work, we collect a new Movie Live
Comments (MovieLC) dataset to support research on live video comment generation
for long videos. We also propose a knowledge enhanced generation model inspired
by the divergent and informative nature of live video comments. Our model
adopts a pre-training encoder-decoder framework and incorporates external
knowledge. Extensive experiments show that both objective metrics and human
evaluation demonstrate the effectiveness of our proposed model. The MovieLC
dataset and our code will be released.
Related papers
- VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation [38.84663997781797]
We release VideoFeedback, the first large-scale dataset containing human-provided multi-aspect score over 37.6K synthesized videos.
Experiments show Spearman correlation between VideoScore and humans can reach 77.1 on VideoFeedback-test, beating the prior best metrics by about 50 points.
arXiv Detail & Related papers (2024-06-21T15:43:46Z) - Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs [20.168429351519055]
Video understanding is a crucial next step for multimodal large language models (LMLMs)
We propose VideoNIAH (Video Needle In A Haystack), a benchmark construction framework through synthetic video generation.
We conduct a comprehensive evaluation of both proprietary and open-source models, uncovering significant differences in their video understanding capabilities.
arXiv Detail & Related papers (2024-06-13T17:50:05Z) - VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities.
We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models.
Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z) - InternVideo2: Scaling Foundation Models for Multimodal Video Understanding [51.129913789991924]
InternVideo2 is a new family of video foundation models (FM) that achieve state-of-the-art results in video recognition, video-speech tasks, and video-centric tasks.
Our core design is a progressive training approach that unifies the masked video modeling, cross contrastive learning, and prediction token, scaling up to 6B video size.
arXiv Detail & Related papers (2024-03-22T17:57:42Z) - ViCo: Engaging Video Comment Generation with Human Preference Rewards [68.50351391812723]
We propose ViCo with three novel designs to tackle the challenges for generating engaging Video Comments.
To quantify the engagement of comments, we utilize the number of "likes" each comment receives as a proxy of human preference.
To automatically evaluate the engagement of comments, we train a reward model to align its judgment to the above proxy.
arXiv Detail & Related papers (2023-08-22T04:01:01Z) - Video Generation Beyond a Single Clip [76.5306434379088]
Video generation models can only generate video clips that are relatively short compared with the length of real videos.
To generate long videos covering diverse content and multiple events, we propose to use additional guidance to control the video generation process.
The proposed approach is complementary to existing efforts on video generation, which focus on generating realistic video within a fixed time window.
arXiv Detail & Related papers (2023-04-15T06:17:30Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - Video Content Swapping Using GAN [1.2300363114433952]
In this work, we will break down any frame in the video into content and pose.
We first extract the pose information from a video using a pre-trained human pose detection and use a generative model to synthesize the video based on the content code and pose code.
arXiv Detail & Related papers (2021-11-21T23:01:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.