Series Photo Selection via Multi-view Graph Learning
- URL: http://arxiv.org/abs/2203.09736v1
- Date: Fri, 18 Mar 2022 04:23:25 GMT
- Title: Series Photo Selection via Multi-view Graph Learning
- Authors: Jin Huang, Lu Zhang, Yongshun Gong, Jian Zhang, Xiushan Nie, Yilong
Yin
- Abstract summary: Series photo selection (SPS) is an important branch of the image aesthetics quality assessment.
We leverage a graph neural network to construct the relationships between multi-view features.
A siamese network is proposed to select the best one from a series of nearly identical photos.
- Score: 52.33318426088579
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Series photo selection (SPS) is an important branch of the image aesthetics
quality assessment, which focuses on finding the best one from a series of
nearly identical photos. While a great progress has been observed, most of the
existing SPS approaches concentrate solely on extracting features from the
original image, neglecting that multiple views, e.g, saturation level, color
histogram and depth of field of the image, will be of benefit to successfully
reflecting the subtle aesthetic changes. Taken multi-view into consideration,
we leverage a graph neural network to construct the relationships between
multi-view features. Besides, multiple views are aggregated with an
adaptive-weight self-attention module to verify the significance of each view.
Finally, a siamese network is proposed to select the best one from a series of
nearly identical photos. Experimental results demonstrate that our model
accomplish the highest success rates compared with competitive methods.
Related papers
- Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling [58.50618448027103]
Contrastive Language-Image Pretraining (CLIP) stands out as a prominent method for image representation learning.
This paper explores the differences across various CLIP-trained vision backbones.
Method achieves a remarkable increase in accuracy of up to 39.1% over the best single backbone.
arXiv Detail & Related papers (2024-05-27T12:59:35Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Focus on the Positives: Self-Supervised Learning for Biodiversity
Monitoring [9.086207853136054]
We address the problem of learning self-supervised representations from unlabeled image collections.
We exploit readily available context data that encodes information such as the spatial and temporal relationships between the input images.
For the critical task of global biodiversity monitoring, this results in image features that can be adapted to challenging visual species classification tasks with limited human supervision.
arXiv Detail & Related papers (2021-08-14T01:12:41Z) - Multi-Label Image Classification with Contrastive Learning [57.47567461616912]
We show that a direct application of contrastive learning can hardly improve in multi-label cases.
We propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting.
arXiv Detail & Related papers (2021-07-24T15:00:47Z) - Multi-view Contrastive Coding of Remote Sensing Images at Pixel-level [5.64497799927668]
A pixel-wise contrastive approach based on an unlabeled multi-view setting is proposed to overcome this limitation.
A pseudo-Siamese ResUnet is trained to learn a representation that aims to align features from the shifted positive pairs.
Results demonstrate both improvements in efficiency and accuracy over the state-of-the-art multi-view contrastive methods.
arXiv Detail & Related papers (2021-05-18T13:28:46Z) - Efficient and Accurate Multi-scale Topological Network for Single Image
Dehazing [31.543771270803056]
In this paper, we pay attention to the feature extraction and utilization of the input image itself.
We propose a Multi-scale Topological Network (MSTN) to fully explore the features at different scales.
Meanwhile, we design a Multi-scale Feature Fusion Module (MFFM) and an Adaptive Feature Selection Module (AFSM) to achieve the selection and fusion of features at different scales.
arXiv Detail & Related papers (2021-02-24T08:53:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.