Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model
- URL: http://arxiv.org/abs/2503.23746v1
- Date: Mon, 31 Mar 2025 05:53:15 GMT
- Title: Short-video Propagation Influence Rating: A New Real-world Dataset and A New Large Graph Model
- Authors: Dizhan Xue, Jing Cui, Shengsheng Qian, Chuanrui Hu, Changsheng Xu,
- Abstract summary: Cross-platform Short-Video dataset includes 117,720 videos, 381,926 samples, and 535 topics across 5 biggest Chinese platforms.<n>Large Graph Model (LGM) named NetGPT can bridge heterogeneous graph-structured data with the powerful reasoning ability and knowledge of Large Language Models (LLMs)<n>Our NetGPT can comprehend and analyze the short-video propagation graph, enabling it to predict the long-term propagation influence of short-videos.
- Score: 55.58701436630489
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Short-video platforms have gained immense popularity, captivating the interest of millions, if not billions, of users globally. Recently, researchers have highlighted the significance of analyzing the propagation of short-videos, which typically involves discovering commercial values, public opinions, user behaviors, etc. This paper proposes a new Short-video Propagation Influence Rating (SPIR) task and aims to promote SPIR from both the dataset and method perspectives. First, we propose a new Cross-platform Short-Video (XS-Video) dataset, which aims to provide a large-scale and real-world short-video propagation network across various platforms to facilitate the research on short-video propagation. Our XS-Video dataset includes 117,720 videos, 381,926 samples, and 535 topics across 5 biggest Chinese platforms, annotated with the propagation influence from level 0 to 9. To the best of our knowledge, this is the first large-scale short-video dataset that contains cross-platform data or provides all of the views, likes, shares, collects, fans, comments, and comment content. Second, we propose a Large Graph Model (LGM) named NetGPT, based on a novel three-stage training mechanism, to bridge heterogeneous graph-structured data with the powerful reasoning ability and knowledge of Large Language Models (LLMs). Our NetGPT can comprehend and analyze the short-video propagation graph, enabling it to predict the long-term propagation influence of short-videos. Comprehensive experimental results evaluated by both classification and regression metrics on our XS-Video dataset indicate the superiority of our method for SPIR.
Related papers
- HierSum: A Global and Local Attention Mechanism for Video Summarization [14.88934924520362]
We focus on summarizing instructional videos and propose a method for breaking down a video into meaningful segments.
HierSum integrates fine-grained local cues from subtitles with global contextual information provided by video-level instructions.
We show that HierSum consistently outperforms existing methods in key metrics such as F1-score and rank correlation.
arXiv Detail & Related papers (2025-04-25T20:30:30Z) - Delving Deep into Engagement Prediction of Short Videos [34.38399476375175]
This study delves deep into the intricacies of predicting engagement for newly published videos with limited user interactions.
We introduce a substantial dataset comprising 90,000 real-world short videos from Snapchat.
Our method demonstrates its ability to predict engagements of short videos purely from video content.
arXiv Detail & Related papers (2024-09-30T23:57:07Z) - CinePile: A Long Video Question Answering Dataset and Benchmark [55.30860239555001]
We present a novel dataset and benchmark, CinePile, specifically designed for authentic long-form video understanding.
Our comprehensive dataset comprises 305,000 multiple-choice questions (MCQs), covering various visual and multimodal aspects.
We fine-tuned open-source Video-LLMs on the training split and evaluated both open-source and proprietary video-centric LLMs on the test split of our dataset.
arXiv Detail & Related papers (2024-05-14T17:59:02Z) - Towards Generalist Robot Learning from Internet Video: A Survey [56.621902345314645]
We present an overview of the emerging field of Learning from Videos (LfV)
LfV aims to address the robotics data bottleneck by augmenting traditional robot data with large-scale internet video data.
We provide a review of current methods for extracting knowledge from large-scale internet video, addressing key challenges in LfV, and boosting downstream robot and reinforcement learning via the use of video data.
arXiv Detail & Related papers (2024-04-30T15:57:41Z) - Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy
Labels [33.659146748289444]
We create a benchmark dataset consisting of around 2 million videos with associated user-generated annotations and other meta information.
We show how a network pretrained on the proposed dataset can help against video corruption and label noise in downstream datasets.
arXiv Detail & Related papers (2021-10-13T16:12:18Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - Learning from Weakly-labeled Web Videos via Exploring Sub-Concepts [89.06560404218028]
We introduce a new method for pre-training video action recognition models using queried web videos.
Instead of trying to filter out, we propose to convert the potential noises in these queried videos to useful supervision signals.
We show that SPL outperforms several existing pre-training strategies using pseudo-labels.
arXiv Detail & Related papers (2021-01-11T05:50:16Z) - Comprehensive Instructional Video Analysis: The COIN Dataset and
Performance Evaluation [100.68317848808327]
We present a large-scale dataset named as "COIN" for COmprehensive INstructional video analysis.
COIN dataset contains 11,827 videos of 180 tasks in 12 domains related to our daily life.
With a new developed toolbox, all the videos are annotated efficiently with a series of step labels and the corresponding temporal boundaries.
arXiv Detail & Related papers (2020-03-20T16:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.