Image Aesthetics Assessment Using Graph Attention Network
- URL: http://arxiv.org/abs/2206.12869v2
- Date: Tue, 28 Jun 2022 08:28:01 GMT
- Title: Image Aesthetics Assessment Using Graph Attention Network
- Authors: Koustav Ghosal, Aljosa Smolic
- Abstract summary: We present a two-stage framework based on graph neural networks for image aesthetics assessment.
First, we propose a feature-graph representation in which the input image is modelled as a graph, maintaining its original aspect ratio and resolution.
Second, we propose a graph neural network architecture that takes this feature-graph and captures the semantic relationship between the different regions of the input image using visual attention.
- Score: 17.277954886018353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aspect ratio and spatial layout are two of the principal factors determining
the aesthetic value of a photograph. But, incorporating these into the
traditional convolution-based frameworks for the task of image aesthetics
assessment is problematic. The aspect ratio of the photographs gets distorted
while they are resized/cropped to a fixed dimension to facilitate training
batch sampling. On the other hand, the convolutional filters process
information locally and are limited in their ability to model the global
spatial layout of a photograph. In this work, we present a two-stage framework
based on graph neural networks and address both these problems jointly. First,
we propose a feature-graph representation in which the input image is modelled
as a graph, maintaining its original aspect ratio and resolution. Second, we
propose a graph neural network architecture that takes this feature-graph and
captures the semantic relationship between the different regions of the input
image using visual attention. Our experiments show that the proposed framework
advances the state-of-the-art results in aesthetic score regression on the
Aesthetic Visual Analysis (AVA) benchmark.
Related papers
- Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Cross-view Self-localization from Synthesized Scene-graphs [1.9580473532948401]
Cross-view self-localization is a challenging scenario of visual place recognition in which database images are provided from sparse viewpoints.
We propose a new hybrid scene model that combines the advantages of view-invariant appearance features computed from raw images and view-dependent spatial-semantic features computed from synthesized images.
arXiv Detail & Related papers (2023-10-24T04:16:27Z) - HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands.
We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Image Keypoint Matching using Graph Neural Networks [22.33342295278866]
We propose a graph neural network for the problem of image matching.
The proposed method first generates initial soft correspondences between keypoints using localized node embeddings.
We evaluate our method on natural image datasets with keypoint annotations and show that, in comparison to a state-of-the-art model, our method speeds up inference times without sacrificing prediction accuracy.
arXiv Detail & Related papers (2022-05-27T23:38:44Z) - Graph Representation Learning for Spatial Image Steganalysis [11.358487655918678]
We introduce a graph representation learning architecture for spatial image steganalysis.
In the detailed architecture, we translate each image to a graph, where nodes represent the patches of the image and edges indicate the local associations between the patches.
By feeding the graph to an attention network, the discriminative features can be learned for efficient steganalysis.
arXiv Detail & Related papers (2021-10-03T09:09:08Z) - VisGraphNet: a complex network interpretation of convolutional neural
features [6.50413414010073]
We propose and investigate the use of visibility graphs to model the feature map of a neural network.
The work is motivated by an alternative viewpoint provided by these graphs over the original data.
arXiv Detail & Related papers (2021-08-27T20:21:04Z) - Gigapixel Histopathological Image Analysis using Attention-based Neural
Networks [7.1715252990097325]
We propose a CNN structure consisting of a compressing path and a learning path.
Our method integrates both global and local information, is flexible with regard to the size of the input images and only requires weak image-level labels.
arXiv Detail & Related papers (2021-01-25T10:18:52Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z) - Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision.
We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.