Exploring Visual Engagement Signals for Representation Learning
- URL: http://arxiv.org/abs/2104.07767v1
- Date: Thu, 15 Apr 2021 20:50:40 GMT
- Title: Exploring Visual Engagement Signals for Representation Learning
- Authors: Menglin Jia, Zuxuan Wu, Austin Reiter, Claire Cardie, Serge Belongie,
Ser-Nam Lim
- Abstract summary: We present VisE, a weakly supervised learning approach, which maps social images to pseudo labels derived by clustered engagement signals.
We then study how models trained in this way benefit subjective downstream computer vision tasks such as emotion recognition or political bias detection.
- Score: 56.962033268934015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual engagement in social media platforms comprises interactions with photo
posts including comments, shares, and likes. In this paper, we leverage such
visual engagement clues as supervisory signals for representation learning.
However, learning from engagement signals is non-trivial as it is not clear how
to bridge the gap between low-level visual information and high-level social
interactions. We present VisE, a weakly supervised learning approach, which
maps social images to pseudo labels derived by clustered engagement signals. We
then study how models trained in this way benefit subjective downstream
computer vision tasks such as emotion recognition or political bias detection.
Through extensive studies, we empirically demonstrate the effectiveness of VisE
across a diverse set of classification tasks beyond the scope of conventional
recognition.
Related papers
- Visual In-Context Learning for Large Vision-Language Models [62.5507897575317]
In Large Visual Language Models (LVLMs) the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities.
We introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition.
Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations.
arXiv Detail & Related papers (2024-02-18T12:43:38Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - Visual resemblance and communicative context constrain the emergence of
graphical conventions [21.976382800327965]
Drawing provides a versatile medium for communicating about the visual world.
Do viewers understand drawings based solely on their ability to resemble the entities they refer to (i.e., as images)?
Do they understand drawings based on shared but arbitrary associations with these entities (i.e. as symbols)?
arXiv Detail & Related papers (2021-09-17T23:05:36Z) - Visual Adversarial Imitation Learning using Variational Models [60.69745540036375]
Reward function specification remains a major impediment for learning behaviors through deep reinforcement learning.
Visual demonstrations of desired behaviors often presents an easier and more natural way to teach agents.
We develop a variational model-based adversarial imitation learning algorithm.
arXiv Detail & Related papers (2021-07-16T00:15:18Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Multimodal Contrastive Training for Visual Representation Learning [45.94662252627284]
We develop an approach to learning visual representations that embraces multimodal data.
Our method exploits intrinsic data properties within each modality and semantic information from cross-modal correlation simultaneously.
By including multimodal training in a unified framework, our method can learn more powerful and generic visual features.
arXiv Detail & Related papers (2021-04-26T19:23:36Z) - Embodied Visual Active Learning for Semantic Segmentation [33.02424587900808]
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding.
We develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment.
We extensively evaluate the proposed models using the Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts.
arXiv Detail & Related papers (2020-12-17T11:02:34Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.