Disentangle Perceptual Learning through Online Contrastive Learning
- URL: http://arxiv.org/abs/2006.13511v1
- Date: Wed, 24 Jun 2020 06:48:38 GMT
- Title: Disentangle Perceptual Learning through Online Contrastive Learning
- Authors: Kangfu Mei, Yao Lu, Qiaosi Yi, Haoyu Wu, Juncheng Li, Rui Huang
- Abstract summary: Pursuing realistic results according to human visual perception is the central concern in the image transformation tasks.
In this paper, we argue that, among the features representation from the pre-trained classification network, only limited dimensions are related to human visual perception.
Under such an assumption, we try to disentangle the perception-relevant dimensions from the representation through our proposed online contrastive learning.
- Score: 16.534353501066203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pursuing realistic results according to human visual perception is the
central concern in the image transformation tasks. Perceptual learning
approaches like perceptual loss are empirically powerful for such tasks but
they usually rely on the pre-trained classification network to provide
features, which are not necessarily optimal in terms of visual perception of
image transformation. In this paper, we argue that, among the features
representation from the pre-trained classification network, only limited
dimensions are related to human visual perception, while others are irrelevant,
although both will affect the final image transformation results. Under such an
assumption, we try to disentangle the perception-relevant dimensions from the
representation through our proposed online contrastive learning. The resulted
network includes the pre-training part and a feature selection layer, followed
by the contrastive learning module, which utilizes the transformed results,
target images, and task-oriented distorted images as the positive, negative,
and anchor samples, respectively. The contrastive learning aims at activating
the perception-relevant dimensions and suppressing the irrelevant ones by using
the triplet loss, so that the original representation can be disentangled for
better perceptual quality. Experiments on various image transformation tasks
demonstrate the superiority of our framework, in terms of human visual
perception, to the existing approaches using pre-trained networks and
empirically designed losses.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Semiotics Networks Representing Perceptual Inference [0.0]
We present a computational model designed to track and simulate the perception of objects.
Our model is not limited to persons and can be applied to any system featuring a loop involving the processing from "internal" to "external" representations.
arXiv Detail & Related papers (2023-10-08T16:05:17Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Adversarially robust segmentation models learn perceptually-aligned
gradients [0.0]
We show that adversarially-trained semantic segmentation networks can be used to perform image inpainting and generation.
We argue that perceptually-aligned gradients promote a better understanding of a neural network's learned representations and aid in making neural networks more interpretable.
arXiv Detail & Related papers (2022-04-03T16:04:52Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Unsupervised Deep Metric Learning with Transformed Attention Consistency
and Contrastive Clustering Loss [28.17607283348278]
Existing approaches for unsupervised metric learning focus on exploring self-supervision information within the input image itself.
We observe that, when analyzing images, human eyes often compare images against each other instead of examining images individually.
We develop a new approach to unsupervised deep metric learning where the network is learned based on self-supervision information across images.
arXiv Detail & Related papers (2020-08-10T19:33:47Z) - Seeing eye-to-eye? A comparison of object recognition performance in
humans and deep convolutional neural networks under image manipulation [0.0]
This study aims towards a behavioral comparison of visual core object recognition performance between humans and feedforward neural networks.
Analyses of accuracy revealed that humans not only outperform DCNNs on all conditions, but also display significantly greater robustness towards shape and most notably color alterations.
arXiv Detail & Related papers (2020-07-13T10:26:30Z) - Disentangling Image Distortions in Deep Feature Space [20.220653544354285]
We take a step in the direction of a broader understanding of perceptual similarity by analyzing the capability of deep visual representations to intrinsically characterize different types of image distortions.
A dimension-reduced representation of the features extracted from a given layer permits to efficiently separate types of distortions in the feature space.
Each network layer exhibits a different ability to separate between different types of distortions, and this ability varies according to the network architecture.
arXiv Detail & Related papers (2020-02-26T11:02:13Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.