Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations
- URL: http://arxiv.org/abs/2404.07153v1
- Date: Wed, 10 Apr 2024 16:39:50 GMT
- Title: Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations
- Authors: Ofir Shifman, Yair Weiss,
- Abstract summary: Deep neural networks that achieve remarkable performance in image classification can be easily fooled by tiny transformations.
We show that these approaches still fall short in robustly handling 'natural' image translations that simulate a subtle change in camera orientation.
We present Robust Inference by Crop Selection: a simple method that can be proven to achieve any desired level of consistency.
- Score: 8.248839892711478
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep neural networks that achieve remarkable performance in image classification have previously been shown to be easily fooled by tiny transformations such as a one pixel translation of the input image. In order to address this problem, two approaches have been proposed in recent years. The first approach suggests using huge datasets together with data augmentation in the hope that a highly varied training set will teach the network to learn to be invariant. The second approach suggests using architectural modifications based on sampling theory to deal explicitly with image translations. In this paper, we show that these approaches still fall short in robustly handling 'natural' image translations that simulate a subtle change in camera orientation. Our findings reveal that a mere one-pixel translation can result in a significant change in the predicted image representation for approximately 40% of the test images in state-of-the-art models (e.g. open-CLIP trained on LAION-2B or DINO-v2) , while models that are explicitly constructed to be robust to cyclic translations can still be fooled with 1 pixel realistic (non-cyclic) translations 11% of the time. We present Robust Inference by Crop Selection: a simple method that can be proven to achieve any desired level of consistency, although with a modest tradeoff with the model's accuracy. Importantly, we demonstrate how employing this method reduces the ability to fool state-of-the-art models with a 1 pixel translation to less than 5% while suffering from only a 1% drop in classification accuracy. Additionally, we show that our method can be easy adjusted to deal with circular shifts as well. In such case we achieve 100% robustness to integer shifts with state-of-the-art accuracy, and with no need for any further training.
Related papers
- CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for
Image Manipulation [57.836686457542385]
Diffusion models (DMs) have enabled breakthroughs in image synthesis tasks but lack an intuitive interface for consistent image-to-image (I2I) translation.
This paper introduces Cyclenet, a novel but simple method that incorporates cycle consistency into DMs to regularize image manipulation.
arXiv Detail & Related papers (2023-10-19T21:32:21Z) - Expanding Language-Image Pretrained Models for General Video Recognition [136.0948049010682]
Contrastive language-image pretraining has shown great success in learning visual-textual joint representation from web-scale data.
We present a simple yet effective approach that adapts the pretrained language-image models to video recognition directly.
Our approach surpasses the current state-of-the-art methods by +7.6% and +14.9% in terms of top-1 accuracy under two popular protocols.
arXiv Detail & Related papers (2022-08-04T17:59:54Z) - Neural Style Transfer and Unpaired Image-to-Image Translation to deal
with the Domain Shift Problem on Spheroid Segmentation [0.0]
Domain shift is a generalisation problem of machine learning models that occurs when the data distribution of the training set is different to the data distribution encountered by the model when it is deployed.
This is common in the context of biomedical image segmentation due to the variance of experimental conditions, equipment, and capturing settings.
We have illustrated the domain shift problem in the context of spheroid segmentation with 4 deep learning segmentation models that achieved an IoU over 97% when tested with images following the training distribution, but whose performance decreased up to an 84% when applied to images captured under different conditions.
arXiv Detail & Related papers (2021-12-16T17:34:45Z) - With a Little Help from My Friends: Nearest-Neighbor Contrastive
Learning of Visual Representations [87.72779294717267]
Using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification.
We demonstrate empirically that our method is less reliant on complex data augmentations.
arXiv Detail & Related papers (2021-04-29T17:56:08Z) - An Empirical Study of the Collapsing Problem in Semi-Supervised 2D Human
Pose Estimation [80.02124918255059]
Semi-supervised learning aims to boost the accuracy of a model by exploring unlabeled images.
We learn two networks to mutually teach each other.
The more reliable predictions on easy images in each network are used to teach the other network to learn about the corresponding hard images.
arXiv Detail & Related papers (2020-11-25T03:29:52Z) - DeepI2I: Enabling Deep Hierarchical Image-to-Image Translation by
Transferring from GANs [43.33066765114446]
Image-to-image translation suffers from inferior performance when translations between classes require large shape changes.
We propose a novel deep hierarchical Image-to-Image Translation method, called DeepI2I.
We demonstrate that transfer learning significantly improves the performance of I2I systems, especially for small datasets.
arXiv Detail & Related papers (2020-11-11T16:03:03Z) - PREGAN: Pose Randomization and Estimation for Weakly Paired Image Style
Translation [11.623477199795037]
We propose a weakly-paired setting for the style translation, where the content in the two images is aligned with errors in poses.
PREGAN is validated on both simulated and real-world collected data to show the effectiveness.
arXiv Detail & Related papers (2020-10-31T16:11:11Z) - Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2
Network [73.5062435623908]
We propose a new I2I translation method that generates a new model in the target domain via a series of model transformations.
By feeding the latent vector into the generated model, we can perform I2I translation between the source domain and target domain.
arXiv Detail & Related papers (2020-10-12T13:51:40Z) - Radon cumulative distribution transform subspace modeling for image
classification [18.709734704950804]
We present a new supervised image classification method applicable to a broad class of image deformation models.
The method makes use of the previously described Radon Cumulative Distribution Transform (R-CDT) for image data.
In addition to the test accuracy performances, we show improvements in terms of computational efficiency.
arXiv Detail & Related papers (2020-04-07T19:47:26Z) - Semi-supervised Learning for Few-shot Image-to-Image Translation [89.48165936436183]
We propose a semi-supervised method for few-shot image translation, called SEMIT.
Our method achieves excellent results on four different datasets using as little as 10% of the source labels.
arXiv Detail & Related papers (2020-03-30T22:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.