Micro-video Tagging via Jointly Modeling Social Influence and Tag
Relation
- URL: http://arxiv.org/abs/2303.08318v1
- Date: Wed, 15 Mar 2023 02:13:34 GMT
- Title: Micro-video Tagging via Jointly Modeling Social Influence and Tag
Relation
- Authors: Xiao Wang, Tian Gan, Yinwei Wei, Jianlong Wu, Dai Meng, Liqiang Nie
- Abstract summary: 85.7% of micro-videos lack annotation.
Existing methods mostly focus on analyzing video content, neglecting users' social influence and tag relation.
We formulate micro-video tagging as a link prediction problem in a constructed heterogeneous network.
- Score: 56.23157334014773
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The last decade has witnessed the proliferation of micro-videos on various
user-generated content platforms. According to our statistics, around 85.7\% of
micro-videos lack annotation. In this paper, we focus on annotating
micro-videos with tags. Existing methods mostly focus on analyzing video
content, neglecting users' social influence and tag relation. Meanwhile,
existing tag relation construction methods suffer from either deficient
performance or low tag coverage. To jointly model social influence and tag
relation, we formulate micro-video tagging as a link prediction problem in a
constructed heterogeneous network. Specifically, the tag relation (represented
by tag ontology) is constructed in a semi-supervised manner. Then, we combine
tag relation, video-tag annotation, and user-follow relation to build the
network. Afterward, a better video and tag representation are derived through
Behavior Spread modeling and visual and linguistic knowledge aggregation.
Finally, the semantic similarity between each micro-video and all candidate
tags is calculated in this video-tag network. Extensive experiments on
industrial datasets of three verticals verify the superiority of our model
compared with several state-of-the-art baselines.
Related papers
- Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding [25.4933695784155]
Understanding of video creativity and content often varies among individuals, with differences in focal points and cognitive levels across different ages, experiences, and genders.
To bridge the gap to real-world applications, we introduce a large-scale Subjective Response Indicators for Advertisement Videos dataset.
We developed tasks and protocols to analyze and evaluate the extent of cognitive understanding of video content among different users.
arXiv Detail & Related papers (2024-07-11T03:00:26Z) - Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks [25.96897989272303]
Main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content.
We propose chunk-level text-video matching, where the query chunks are extracted to describe a specific retrieval unit.
We formulate the chunk-level matching as n-ary correlations modeling between words of the query and frames of the video.
arXiv Detail & Related papers (2024-01-06T09:38:55Z) - Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
Generation [55.429541407920304]
Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature.
Recent state-of-the-art methods predominantly focus on the most frequently occurring predicate classes.
We introduce a multi-label meta-learning framework to deal with the biased predicate distribution.
arXiv Detail & Related papers (2023-06-16T18:14:23Z) - The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual
Social Media Discourse [1.465840097113565]
We discuss the development of a multilingual dataset annotated with a hierarchical, fine-grained tagset marking different types of aggression and the "context" in which they occur.
The initial dataset consists of a total 15,000 annotated comments in four languages.
As is usual on social media websites, a large number of these comments are multilingual, mostly code-mixed with English.
arXiv Detail & Related papers (2021-11-19T19:03:22Z) - Influencer Videos: Unboxing the Mystique [0.4143603294943439]
We study YouTube influencers and analyze their unstructured video data across text, audio and images.
Our prediction-based approach analyzes unstructured data and finds that "what is said" in words (text) is more influential than "how it is said" in imagery (images) or acoustics (audio)
We uncover novel findings that establish distinct associations for measures of shallow and deep engagement based on the dual-system framework of human thinking.
arXiv Detail & Related papers (2020-12-22T19:32:52Z) - VLG-Net: Video-Language Graph Matching Network for Video Grounding [57.6661145190528]
Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query.
We recast this challenge into an algorithmic graph matching problem.
We demonstrate superior performance over state-of-the-art grounding methods on three widely used datasets.
arXiv Detail & Related papers (2020-11-19T22:32:03Z) - Content-based Analysis of the Cultural Differences between TikTok and
Douyin [95.32409577885645]
Short-form video social media shifts away from the traditional media paradigm by telling the audience a dynamic story to attract their attention.
In particular, different combinations of everyday objects can be employed to represent a unique scene that is both interesting and understandable.
Offered by the same company, TikTok and Douyin are popular examples of such new media that has become popular in recent years.
The hypothesis that they express cultural differences together with media fashion and social idiosyncrasy is the primary target of our research.
arXiv Detail & Related papers (2020-11-03T01:47:49Z) - Labelling unlabelled videos from scratch with multi-modal
self-supervision [82.60652426371936]
unsupervised labelling of a video dataset does not come for free from strong feature encoders.
We propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations.
An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels.
arXiv Detail & Related papers (2020-06-24T12:28:17Z) - Comprehensive Information Integration Modeling Framework for Video
Titling [124.11296128308396]
We integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework.
To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization.
We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform.
arXiv Detail & Related papers (2020-06-24T10:38:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.