Content-based Analysis of the Cultural Differences between TikTok and
Douyin
- URL: http://arxiv.org/abs/2011.01414v1
- Date: Tue, 3 Nov 2020 01:47:49 GMT
- Title: Content-based Analysis of the Cultural Differences between TikTok and
Douyin
- Authors: Li Sun, Haoqi Zhang, Songyang Zhang, Jiebo Luo
- Abstract summary: Short-form video social media shifts away from the traditional media paradigm by telling the audience a dynamic story to attract their attention.
In particular, different combinations of everyday objects can be employed to represent a unique scene that is both interesting and understandable.
Offered by the same company, TikTok and Douyin are popular examples of such new media that has become popular in recent years.
The hypothesis that they express cultural differences together with media fashion and social idiosyncrasy is the primary target of our research.
- Score: 95.32409577885645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Short-form video social media shifts away from the traditional media paradigm
by telling the audience a dynamic story to attract their attention. In
particular, different combinations of everyday objects can be employed to
represent a unique scene that is both interesting and understandable. Offered
by the same company, TikTok and Douyin are popular examples of such new media
that has become popular in recent years, while being tailored for different
markets (e.g. the United States and China). The hypothesis that they express
cultural differences together with media fashion and social idiosyncrasy is the
primary target of our research. To that end, we first employ the Faster
Regional Convolutional Neural Network (Faster R-CNN) pre-trained with the
Microsoft Common Objects in COntext (MS-COCO) dataset to perform object
detection. Based on a suite of objects detected from videos, we perform
statistical analysis including label statistics, label similarity, and
label-person distribution. We further use the Two-Stream Inflated 3D ConvNet
(I3D) pre-trained with the Kinetics dataset to categorize and analyze human
actions. By comparing the distributional results of TikTok and Douyin, we
uncover a wealth of similarity and contrast between the two closely related
video social media platforms along the content dimensions of object quantity,
object categories, and human action categories.
Related papers
- Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions [12.588239777597847]
Media bias significantly shapes public perception by reinforcing stereotypes and exacerbating societal divisions.
We introduce a novel dataset collected from YouTube and Reddit over the past five years.
Our dataset includes automated annotations for YouTube content across a broad spectrum of bias dimensions.
arXiv Detail & Related papers (2024-08-27T21:03:42Z) - GREENER: Graph Neural Networks for News Media Profiling [24.675574340841163]
We study the problem of profiling news media on the Web with respect to their factuality of reporting and bias.
Our main focus is on modeling the similarity between media outlets based on the overlap of their audience.
Prediction accuracy is found to improve by 2.5-27 macro-F1 points for the two tasks.
arXiv Detail & Related papers (2022-11-10T12:46:29Z) - Temporal Saliency Query Network for Efficient Video Recognition [82.52760040577864]
Video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices.
Most existing methods select the salient frames without awareness of the class-specific saliency scores.
We propose a novel Temporal Saliency Query (TSQ) mechanism, which introduces class-specific information to provide fine-grained cues for saliency measurement.
arXiv Detail & Related papers (2022-07-21T09:23:34Z) - LIGHTEN: Learning Interactions with Graph and Hierarchical TEmporal
Networks for HOI in videos [13.25502885135043]
Analyzing the interactions between humans and objects from a video includes identification of relationships between humans and the objects present in the video.
We present a hierarchical approach, LIGHTEN, to learn visual features to effectively capture truth at multiple granularities in a video.
We achieve state-of-the-art results in human-object interaction detection (88.9% and 92.6%) and anticipation tasks of CAD-120 and competitive results on image based HOI detection in V-COCO.
arXiv Detail & Related papers (2020-12-17T05:44:07Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - Comprehensive Information Integration Modeling Framework for Video
Titling [124.11296128308396]
We integrate comprehensive sources of information, including the content of consumer-generated videos, the narrative comment sentences supplied by consumers, and the product attributes, in an end-to-end modeling framework.
To tackle this issue, the proposed method consists of two processes, i.e., granular-level interaction modeling and abstraction-level story-line summarization.
We collect a large-scale dataset accordingly from real-world data in Taobao, a world-leading e-commerce platform.
arXiv Detail & Related papers (2020-06-24T10:38:15Z) - Dense-Caption Matching and Frame-Selection Gating for Temporal
Localization in VideoQA [96.10612095576333]
We propose a video question answering model which effectively integrates multi-modal input sources and finds the temporally relevant information to answer questions.
Our model is also comprised of dual-level attention (word/object and frame level), multi-head self-cross-integration for different sources (video and dense captions), and which pass more relevant information to gates.
We evaluate our model on the challenging TVQA dataset, where each of our model components provides significant gains, and our overall model outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2020-05-13T16:35:27Z) - EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's
Principle [71.47160118286226]
We present EmotiCon, a learning-based algorithm for context-aware perceived human emotion recognition from videos and images.
Motivated by Frege's Context Principle from psychology, our approach combines three interpretations of context for emotion recognition.
We report an Average Precision (AP) score of 35.48 across 26 classes, which is an improvement of 7-8 over prior methods.
arXiv Detail & Related papers (2020-03-14T19:55:21Z) - A multi-layer approach to disinformation detection on Twitter [4.663548775064491]
We employ a multi-layer representation of Twitter diffusion networks, and we compute for each layer a set of global network features.
Experimental results with two large-scale datasets, corresponding to diffusion cascades of news shared respectively in the United States and Italy, show that a simple Logistic Regression model is able to classify disinformation vs mainstream networks with high accuracy.
We believe that our network-based approach provides useful insights which pave the way to the future development of a system to detect misleading and harmful information spreading on social media.
arXiv Detail & Related papers (2020-02-28T09:25:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.