Micro-video Tagging via Jointly Modeling Social Influence and Tag
Relation
- URL: http://arxiv.org/abs/2303.08318v1
- Date: Wed, 15 Mar 2023 02:13:34 GMT
- Title: Micro-video Tagging via Jointly Modeling Social Influence and Tag
Relation
- Authors: Xiao Wang, Tian Gan, Yinwei Wei, Jianlong Wu, Dai Meng, Liqiang Nie
- Abstract summary: 85.7% of micro-videos lack annotation.
Existing methods mostly focus on analyzing video content, neglecting users' social influence and tag relation.
We formulate micro-video tagging as a link prediction problem in a constructed heterogeneous network.
- Score: 56.23157334014773
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The last decade has witnessed the proliferation of micro-videos on various
user-generated content platforms. According to our statistics, around 85.7\% of
micro-videos lack annotation. In this paper, we focus on annotating
micro-videos with tags. Existing methods mostly focus on analyzing video
content, neglecting users' social influence and tag relation. Meanwhile,
existing tag relation construction methods suffer from either deficient
performance or low tag coverage. To jointly model social influence and tag
relation, we formulate micro-video tagging as a link prediction problem in a
constructed heterogeneous network. Specifically, the tag relation (represented
by tag ontology) is constructed in a semi-supervised manner. Then, we combine
tag relation, video-tag annotation, and user-follow relation to build the
network. Afterward, a better video and tag representation are derived through
Behavior Spread modeling and visual and linguistic knowledge aggregation.
Finally, the semantic similarity between each micro-video and all candidate
tags is calculated in this video-tag network. Extensive experiments on
industrial datasets of three verticals verify the superiority of our model
compared with several state-of-the-art baselines.
Related papers
- Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks [25.96897989272303]
Main obstacle for text-video retrieval is the semantic gap between the textual nature of queries and the visual richness of video content.
We propose chunk-level text-video matching, where the query chunks are extracted to describe a specific retrieval unit.
We formulate the chunk-level matching as n-ary correlations modeling between words of the query and frames of the video.
arXiv Detail & Related papers (2024-01-06T09:38:55Z) - Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
Generation [55.429541407920304]
Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature.
Recent state-of-the-art methods predominantly focus on the most frequently occurring predicate classes.
We introduce a multi-label meta-learning framework to deal with the biased predicate distribution.
arXiv Detail & Related papers (2023-06-16T18:14:23Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual
Social Media Discourse [1.465840097113565]
We discuss the development of a multilingual dataset annotated with a hierarchical, fine-grained tagset marking different types of aggression and the "context" in which they occur.
The initial dataset consists of a total 15,000 annotated comments in four languages.
As is usual on social media websites, a large number of these comments are multilingual, mostly code-mixed with English.
arXiv Detail & Related papers (2021-11-19T19:03:22Z) - VLG-Net: Video-Language Graph Matching Network for Video Grounding [57.6661145190528]
Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query.
We recast this challenge into an algorithmic graph matching problem.
We demonstrate superior performance over state-of-the-art grounding methods on three widely used datasets.
arXiv Detail & Related papers (2020-11-19T22:32:03Z) - Content-based Analysis of the Cultural Differences between TikTok and
Douyin [95.32409577885645]
Short-form video social media shifts away from the traditional media paradigm by telling the audience a dynamic story to attract their attention.
In particular, different combinations of everyday objects can be employed to represent a unique scene that is both interesting and understandable.
Offered by the same company, TikTok and Douyin are popular examples of such new media that has become popular in recent years.
The hypothesis that they express cultural differences together with media fashion and social idiosyncrasy is the primary target of our research.
arXiv Detail & Related papers (2020-11-03T01:47:49Z) - Understanding YouTube Communities via Subscription-based Channel
Embeddings [0.0]
This paper presents new methods to discover and classify YouTube channels.
The methods use a self-supervised learning approach that leverages the public subscription pages of commenters.
We create a new dataset to analyze the amount of traffic going to different political content.
arXiv Detail & Related papers (2020-10-19T22:00:04Z) - Labelling unlabelled videos from scratch with multi-modal
self-supervision [82.60652426371936]
unsupervised labelling of a video dataset does not come for free from strong feature encoders.
We propose a novel clustering method that allows pseudo-labelling of a video dataset without any human annotations.
An extensive analysis shows that the resulting clusters have high semantic overlap to ground truth human labels.
arXiv Detail & Related papers (2020-06-24T12:28:17Z) - Comprehensive Information Integration Modeling Framework for Video
Titling [124.11296128308396]
We integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework.
To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization.
We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform.
arXiv Detail & Related papers (2020-06-24T10:38:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.