Multimodal Deep Learning Framework for Image Popularity Prediction on
Social Media
- URL: http://arxiv.org/abs/2105.08809v1
- Date: Tue, 18 May 2021 19:58:58 GMT
- Title: Multimodal Deep Learning Framework for Image Popularity Prediction on
Social Media
- Authors: Fatma S. Abousaleh, Wen-Huang Cheng, Neng-Hao Yu, and Yu Tsao
- Abstract summary: The popularity of an image can be affected by several factors, such as visual content, aesthetic quality, user, post metadata, and time.
We propose a deep learning model, called visual-social convolutional neural network (VSCNN), which predicts the popularity of a posted image.
- Score: 24.104443991470287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Billions of photos are uploaded to the web daily through various types of
social networks. Some of these images receive millions of views and become
popular, whereas others remain completely unnoticed. This raises the problem of
predicting image popularity on social media. The popularity of an image can be
affected by several factors, such as visual content, aesthetic quality, user,
post metadata, and time. Thus, considering all these factors is essential for
accurately predicting image popularity. In addition, the efficiency of the
predictive model also plays a crucial role. In this study, motivated by
multimodal learning, which uses information from various modalities, and the
current success of convolutional neural networks (CNNs) in various fields, we
propose a deep learning model, called visual-social convolutional neural
network (VSCNN), which predicts the popularity of a posted image by
incorporating various types of visual and social features into a unified
network model. VSCNN first learns to extract high-level representations from
the input visual and social features by utilizing two individual CNNs. The
outputs of these two networks are then fused into a joint network to estimate
the popularity score in the output layer. We assess the performance of the
proposed method by conducting extensive experiments on a dataset of
approximately 432K images posted on Flickr. The simulation results demonstrate
that the proposed VSCNN model significantly outperforms state-of-the-art
models, with a relative improvement of greater than 2.33%, 7.59%, and 14.16% in
terms of Spearman's Rho, mean absolute error, and mean squared error,
respectively.
Related papers
- CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Biased Attention: Do Vision Transformers Amplify Gender Bias More than
Convolutional Neural Networks? [2.8391805742728553]
Deep neural networks used in computer vision have been shown to exhibit many social biases such as gender bias.
Vision Transformers (ViTs) have become increasingly popular in computer vision applications, outperforming Convolutional Neural Networks (CNNs) in many tasks such as image classification.
This research found that ViTs amplified gender bias to a greater extent than CNNs.
arXiv Detail & Related papers (2023-09-15T20:59:12Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Graph Representation Learning for Popularity Prediction Problem: A
Survey [4.795530213347874]
We present a comprehensive review for existing works using graph representation learning (GRL) methods for popularity prediction problem.
Deep learning method is further classified into six small classes: convolutional neural networks, graph convolutional networks, graph attention networks, graph neural networks, recurrent neural networks, and reinforcement learning.
arXiv Detail & Related papers (2022-03-15T04:11:46Z) - Detection and Localization of Multiple Image Splicing Using MobileNet V1 [0.0]
Two or more images are combined to generate a new image that can transmit information across social media platforms.
This research work proposes multiple image splicing forgery detection using Mask R-CNN, with a backbone as a MobileNet V1.
It also calculates the percentage score of a forged region of multiple spliced images.
arXiv Detail & Related papers (2021-08-22T09:27:22Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Application of Facial Recognition using Convolutional Neural Networks
for Entry Access Control [0.0]
The paper focuses on solving the supervised classification problem of taking images of people as input and classifying the person in the image as one of the authors or not.
Two approaches are proposed: (1) building and training a neural network called WoodNet from scratch and (2) leveraging transfer learning by utilizing a network pre-trained on the ImageNet database.
The results are two models classifying the individuals in the dataset with high accuracy, achieving over 99% accuracy on held-out test data.
arXiv Detail & Related papers (2020-11-23T07:55:24Z) - Shallow Feature Based Dense Attention Network for Crowd Counting [103.67446852449551]
We propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images.
Our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
arXiv Detail & Related papers (2020-06-17T13:34:42Z) - ResNeSt: Split-Attention Networks [86.25490825631763]
We present a modularized architecture, which applies the channel-wise attention on different network branches to leverage their success in capturing cross-feature interactions and learning diverse representations.
Our model, named ResNeSt, outperforms EfficientNet in accuracy and latency trade-off on image classification.
arXiv Detail & Related papers (2020-04-19T20:40:31Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.