Related papers: Image Aesthetics Assessment Using Graph Attention Network

Image Aesthetics Assessment Using Graph Attention Network

URL: http://arxiv.org/abs/2206.12869v2
Date: Tue, 28 Jun 2022 08:28:01 GMT
Title: Image Aesthetics Assessment Using Graph Attention Network
Authors: Koustav Ghosal, Aljosa Smolic
Abstract summary: We present a two-stage framework based on graph neural networks for image aesthetics assessment. First, we propose a feature-graph representation in which the input image is modelled as a graph, maintaining its original aspect ratio and resolution. Second, we propose a graph neural network architecture that takes this feature-graph and captures the semantic relationship between the different regions of the input image using visual attention.
Score: 17.277954886018353
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aspect ratio and spatial layout are two of the principal factors determining the aesthetic value of a photograph. But, incorporating these into the traditional convolution-based frameworks for the task of image aesthetics assessment is problematic. The aspect ratio of the photographs gets distorted while they are resized/cropped to a fixed dimension to facilitate training batch sampling. On the other hand, the convolutional filters process information locally and are limited in their ability to model the global spatial layout of a photograph. In this work, we present a two-stage framework based on graph neural networks and address both these problems jointly. First, we propose a feature-graph representation in which the input image is modelled as a graph, maintaining its original aspect ratio and resolution. Second, we propose a graph neural network architecture that takes this feature-graph and captures the semantic relationship between the different regions of the input image using visual attention. Our experiments show that the proposed framework advances the state-of-the-art results in aesthetic score regression on the Aesthetic Visual Analysis (AVA) benchmark.

Related papers

Neural Scene Designer: Self-Styled Semantic Image Manipulation [67.43125248646653]
We introduce the Neural Scene Designer (NSD), a novel framework that enables photo-realistic manipulation of user-specified scene regions.<n>NSD ensures both semantic alignment with user intent and stylistic consistency with the surrounding environment.<n>To capture fine-grained style representations, we propose the Progressive Self-style Representational Learning (PSRL) module.
arXiv Detail & Related papers (2025-09-01T11:59:03Z)
Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process. Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z)
Cross-view Self-localization from Synthesized Scene-graphs [1.9580473532948401]
Cross-view self-localization is a challenging scenario of visual place recognition in which database images are provided from sparse viewpoints. We propose a new hybrid scene model that combines the advantages of view-invariant appearance features computed from raw images and view-dependent spatial-semantic features computed from synthesized images.
arXiv Detail & Related papers (2023-10-24T04:16:27Z)
HandNeRF: Neural Radiance Fields for Animatable Interacting Hands [122.32855646927013]
We propose a novel framework to reconstruct accurate appearance and geometry with neural radiance fields (NeRF) for interacting hands. We conduct extensive experiments to verify the merits of our proposed HandNeRF and report a series of state-of-the-art results.
arXiv Detail & Related papers (2023-03-24T06:19:19Z)
A domain adaptive deep learning solution for scanpath prediction of paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings. We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans. The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z)
Image Keypoint Matching using Graph Neural Networks [22.33342295278866]
We propose a graph neural network for the problem of image matching. The proposed method first generates initial soft correspondences between keypoints using localized node embeddings. We evaluate our method on natural image datasets with keypoint annotations and show that, in comparison to a state-of-the-art model, our method speeds up inference times without sacrificing prediction accuracy.
arXiv Detail & Related papers (2022-05-27T23:38:44Z)
Graph Representation Learning for Spatial Image Steganalysis [11.358487655918678]
We introduce a graph representation learning architecture for spatial image steganalysis. In the detailed architecture, we translate each image to a graph, where nodes represent the patches of the image and edges indicate the local associations between the patches. By feeding the graph to an attention network, the discriminative features can be learned for efficient steganalysis.
arXiv Detail & Related papers (2021-10-03T09:09:08Z)
VisGraphNet: a complex network interpretation of convolutional neural features [6.50413414010073]
We propose and investigate the use of visibility graphs to model the feature map of a neural network. The work is motivated by an alternative viewpoint provided by these graphs over the original data.
arXiv Detail & Related papers (2021-08-27T20:21:04Z)
Gigapixel Histopathological Image Analysis using Attention-based Neural Networks [7.1715252990097325]
We propose a CNN structure consisting of a compressing path and a learning path. Our method integrates both global and local information, is flexible with regard to the size of the input images and only requires weak image-level labels.
arXiv Detail & Related papers (2021-01-25T10:18:52Z)
Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN) It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner. Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z)
Geometrically Mappable Image Features [85.81073893916414]
Vision-based localization of an agent in a map is an important problem in robotics and computer vision. We propose a method that learns image features targeted for image-retrieval-based localization.
arXiv Detail & Related papers (2020-03-21T15:36:38Z)
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.