Distinguishing Natural and Computer-Generated Images using
Multi-Colorspace fused EfficientNet
- URL: http://arxiv.org/abs/2110.09428v1
- Date: Mon, 18 Oct 2021 15:55:45 GMT
- Title: Distinguishing Natural and Computer-Generated Images using
Multi-Colorspace fused EfficientNet
- Authors: Manjary P Gangan, Anoop K, and Lajish V L
- Abstract summary: In a real-world image forensic scenario, it is highly essential to consider all categories of image generation.
We propose a Multi-Colorspace fused EfficientNet model by parallelly fusing three EfficientNet networks.
Our model outperforms the baselines in terms of accuracy, robustness towards post-processing, and generalizability towards other datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The problem of distinguishing natural images from photo-realistic
computer-generated ones either addresses natural images versus computer
graphics or natural images versus GAN images, at a time. But in a real-world
image forensic scenario, it is highly essential to consider all categories of
image generation, since in most cases image generation is unknown. We, for the
first time, to our best knowledge, approach the problem of distinguishing
natural images from photo-realistic computer-generated images as a three-class
classification task classifying natural, computer graphics, and GAN images. For
the task, we propose a Multi-Colorspace fused EfficientNet model by parallelly
fusing three EfficientNet networks that follow transfer learning methodology
where each network operates in different colorspaces, RGB, LCH, and HSV, chosen
after analyzing the efficacy of various colorspace transformations in this
image forensics problem. Our model outperforms the baselines in terms of
accuracy, robustness towards post-processing, and generalizability towards
other datasets. We conduct psychophysics experiments to understand how
accurately humans can distinguish natural, computer graphics, and GAN images
where we could observe that humans find difficulty in classifying these images,
particularly the computer-generated images, indicating the necessity of
computational algorithms for the task. We also analyze the behavior of our
model through visual explanations to understand salient regions that contribute
to the model's decision making and compare with manual explanations provided by
human participants in the form of region markings, where we could observe
similarities in both the explanations indicating the powerful nature of our
model to take the decisions meaningfully.
Related papers
- A Robust Approach Towards Distinguishing Natural and Computer Generated
Images using Multi-Colorspace fused and Enriched Vision Transformer [0.0]
This work proposes a robust approach towards distinguishing natural and computer generated images.
The proposed approach achieves high performance gain when compared to a set of baselines.
arXiv Detail & Related papers (2023-08-14T17:11:17Z) - Multi-Domain Norm-referenced Encoding Enables Data Efficient Transfer
Learning of Facial Expression Recognition [62.997667081978825]
We propose a biologically-inspired mechanism for transfer learning in facial expression recognition.
Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes.
Our model achieves a classification accuracy of 92.15% on the FERG dataset with extreme data efficiency.
arXiv Detail & Related papers (2023-04-05T09:06:30Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Joint Learning of Deep Texture and High-Frequency Features for
Computer-Generated Image Detection [24.098604827919203]
We propose a joint learning strategy with deep texture and high-frequency features for CG image detection.
A semantic segmentation map is generated to guide the affine transformation operation.
The combination of the original image and the high-frequency components of the original and rendered images are fed into a multi-branch neural network equipped with attention mechanisms.
arXiv Detail & Related papers (2022-09-07T17:30:40Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Cross-View Image Synthesis with Deformable Convolution and Attention
Mechanism [29.528402825356398]
We propose to use Generative Adversarial Networks(GANs) based on a deformable convolution and attention mechanism to solve the problem of cross-view image synthesis.
It is difficult to understand and transform scenes appearance and semantic information from another view, thus we use deformed convolution in the U-net network to improve the network's ability to extract features of objects at different scales.
arXiv Detail & Related papers (2020-07-20T03:08:36Z) - Seeing eye-to-eye? A comparison of object recognition performance in
humans and deep convolutional neural networks under image manipulation [0.0]
This study aims towards a behavioral comparison of visual core object recognition performance between humans and feedforward neural networks.
Analyses of accuracy revealed that humans not only outperform DCNNs on all conditions, but also display significantly greater robustness towards shape and most notably color alterations.
arXiv Detail & Related papers (2020-07-13T10:26:30Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z) - Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics [60.92229707497999]
We introduce a novel principle for self-supervised feature learning based on the discrimination of specific transformations of an image.
We demonstrate experimentally that learning to discriminate transformations such as LCI, image warping and rotations, yields features with state of the art generalization capabilities.
arXiv Detail & Related papers (2020-04-05T22:09:08Z) - Fine-grained Image-to-Image Transformation towards Visual Recognition [102.51124181873101]
We aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image.
We adopt a model based on generative adversarial networks to disentangle the identity related and unrelated factors of an image.
Experiments on the CompCars and Multi-PIE datasets demonstrate that our model preserves the identity of the generated images much better than the state-of-the-art image-to-image transformation models.
arXiv Detail & Related papers (2020-01-12T05:26:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.