Visual Interest Prediction with Attentive Multi-Task Transfer Learning
- URL: http://arxiv.org/abs/2005.12770v2
- Date: Wed, 27 May 2020 10:05:58 GMT
- Title: Visual Interest Prediction with Attentive Multi-Task Transfer Learning
- Authors: Deepanway Ghosal, Maheshkumar H. Kolekar
- Abstract summary: We propose a transfer learning and attention mechanism based neural network model to predict visual interest & affective dimensions in digital photos.
Evaluation of our model on the benchmark dataset shows large improvement over current state-of-the-art systems.
- Score: 6.177155931162925
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Visual interest & affect prediction is a very interesting area of research in
the area of computer vision. In this paper, we propose a transfer learning and
attention mechanism based neural network model to predict visual interest &
affective dimensions in digital photos. Learning the multi-dimensional affects
is addressed through a multi-task learning framework. With various experiments
we show the effectiveness of the proposed approach. Evaluation of our model on
the benchmark dataset shows large improvement over current state-of-the-art
systems.
Related papers
- Masked Modeling for Self-supervised Representation Learning on Vision
and Beyond [69.64364187449773]
Masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training.
We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more.
We conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research.
arXiv Detail & Related papers (2023-12-31T12:03:21Z) - Foveation in the Era of Deep Learning [6.602118206533142]
We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images.
Our model learns to iteratively attend to regions of the image relevant for classification.
We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or budget.
arXiv Detail & Related papers (2023-12-03T16:48:09Z) - Efficient Large-Scale Visual Representation Learning And Evaluation [0.13192560874022083]
We describe challenges in e-commerce vision applications at scale and highlight methods to efficiently train, evaluate, and serve visual representations.
We present ablation studies evaluating visual representations in several downstream tasks.
We include online results from deployed machine learning systems in production on a large scale e-commerce platform.
arXiv Detail & Related papers (2023-05-22T18:25:03Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Peripheral Vision Transformer [52.55309200601883]
We take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition.
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
We evaluate the proposed network, dubbed PerViT, on the large-scale ImageNet dataset and systematically investigate the inner workings of the model for machine perception.
arXiv Detail & Related papers (2022-06-14T12:47:47Z) - An Interactive Visualization Tool for Understanding Active Learning [12.345164513513671]
We present an interactive visualization tool to elucidate the training process of active learning.
The tool enables one to select a sample of interesting data points, view how their prediction values change at different querying stages, and thus better understand when and how active learning works.
arXiv Detail & Related papers (2021-11-09T03:33:26Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - Factors of Influence for Transfer Learning across Diverse Appearance
Domains and Task Types [50.1843146606122]
A simple form of transfer learning is common in current state-of-the-art computer vision models.
Previous systematic studies of transfer learning have been limited and the circumstances in which it is expected to work are not fully understood.
In this paper we carry out an extensive experimental exploration of transfer learning across vastly different image domains.
arXiv Detail & Related papers (2021-03-24T16:24:20Z) - Variational Structured Attention Networks for Deep Visual Representation
Learning [49.80498066480928]
We propose a unified deep framework to jointly learn both spatial attention maps and channel attention in a principled manner.
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework.
We implement the inference rules within the neural network, thus allowing for end-to-end learning of the probabilistic and the CNN front-end parameters.
arXiv Detail & Related papers (2021-03-05T07:37:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.