Feature Lenses: Plug-and-play Neural Modules for
Transformation-Invariant Visual Representations
- URL: http://arxiv.org/abs/2004.05554v1
- Date: Sun, 12 Apr 2020 06:36:15 GMT
- Title: Feature Lenses: Plug-and-play Neural Modules for
Transformation-Invariant Visual Representations
- Authors: Shaohua Li, Xiuchao Sui, Jie Fu, Yong Liu, Rick Siow Mong Goh
- Abstract summary: Convolutional Neural Networks (CNNs) are known to be brittle under various image transformations.
We propose "Feature Lenses", a set of ad-hoc modules that can be easily plugged into a trained model.
Each individual lens reconstructs the original features given the features of a transformed image under a particular transformation.
- Score: 33.02732996829386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (CNNs) are known to be brittle under various
image transformations, including rotations, scalings, and changes of lighting
conditions. We observe that the features of a transformed image are drastically
different from the ones of the original image. To make CNNs more invariant to
transformations, we propose "Feature Lenses", a set of ad-hoc modules that can
be easily plugged into a trained model (referred to as the "host model"). Each
individual lens reconstructs the original features given the features of a
transformed image under a particular transformation. These lenses jointly
counteract feature distortions caused by various transformations, thus making
the host model more robust without retraining. By only updating lenses, the
host model is freed from iterative updating when facing new transformations
absent in the training data; as feature semantics are preserved, downstream
applications, such as classifiers and detectors, automatically gain robustness
without retraining. Lenses are trained in a self-supervised fashion with no
annotations, by minimizing a novel "Top-K Activation Contrast Loss" between
lens-transformed features and original features. Evaluated on ImageNet,
MNIST-rot, and CIFAR-10, Feature Lenses show clear advantages over baseline
methods.
Related papers
- Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance [9.346027495459039]
Stretch-and-Squeeze (SnS) is an unbiased, model-agnostic, and gradient-free framework to characterize a unit's invariance landscape.<n>SnS seeks perturbations that maximally alter the representation of a reference stimulus in a given processing stage while preserving unit activation.<n>Applying to convolutional neural networks (CNNs), SnS revealed image variations that were further from a reference image in pixel-space than those produced by affine transformations.
arXiv Detail & Related papers (2025-06-20T14:49:35Z) - Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation [3.7622885602373626]
We propose a self-supervised learning approach to learning computer vision features.
The system learns transformations independently by reconstructing images that have undergone previously unseen transformations.
Our approach performs strong on a rich set of realistic computer vision downstream tasks, almost always improving over all baselines.
arXiv Detail & Related papers (2025-03-24T15:01:50Z) - Adaptive Camera Sensor for Vision Models [4.566795168995489]
Lens is a novel camera sensor control method that enhances model performance by capturing high-quality images from the model's perspective.
At its core, Lens utilizes VisiT, a training-free, model-specific quality indicator that evaluates individual unlabeled samples at test time.
To validate Lens, we introduce ImageNet-ES Diverse, a new benchmark dataset capturing natural perturbations from varying sensor and lighting conditions.
arXiv Detail & Related papers (2025-03-04T01:20:23Z) - Self-supervised Transformation Learning for Equivariant Representations [26.207358743969277]
Unsupervised representation learning has significantly advanced various machine learning tasks.
We propose Self-supervised Transformation Learning (STL), replacing transformation labels with transformation representations derived from image pairs.
We demonstrate the approach's effectiveness across diverse classification and detection tasks, outperforming existing methods in 7 out of 11 benchmarks.
arXiv Detail & Related papers (2025-01-15T10:54:21Z) - Stable Flow: Vital Layers for Training-Free Image Editing [74.52248787189302]
Diffusion models have revolutionized the field of content synthesis and editing.
Recent models have replaced the traditional UNet architecture with the Diffusion Transformer (DiT)
We propose an automatic method to identify "vital layers" within DiT, crucial for image formation.
Next, to enable real-image editing, we introduce an improved image inversion method for flow models.
arXiv Detail & Related papers (2024-11-21T18:59:51Z) - Invariant Shape Representation Learning For Image Classification [41.610264291150706]
In this paper, we introduce a novel framework that for the first time develops invariant shape representation learning (ISRL)
Our model ISRL is designed to jointly capture invariant features in latent shape spaces parameterized by deformable transformations.
By embedding the features that are invariant with regard to target variables in different environments, our model consistently offers more accurate predictions.
arXiv Detail & Related papers (2024-11-19T03:39:43Z) - PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - B-cos Alignment for Inherently Interpretable CNNs and Vision
Transformers [97.75725574963197]
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training.
We show that a sequence of such transformations induces a single linear transformation that faithfully summarises the full model computations.
We show that the resulting explanations are of high visual quality and perform well under quantitative interpretability metrics.
arXiv Detail & Related papers (2023-06-19T12:54:28Z) - Diffusion Visual Counterfactual Explanations [51.077318228247925]
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image.
Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts.
In this paper, we overcome this by generating Visual Diffusion Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers.
arXiv Detail & Related papers (2022-10-21T09:35:47Z) - Quantised Transforming Auto-Encoders: Achieving Equivariance to
Arbitrary Transformations in Deep Networks [23.673155102696338]
Convolutional Neural Networks (CNNs) are equivariant to image translation.
We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously.
We demonstrate results of successful re-rendering of transformed versions of input images on several datasets.
arXiv Detail & Related papers (2021-11-25T02:26:38Z) - Robust Training Using Natural Transformation [19.455666609149567]
We present NaTra, an adversarial training scheme to improve robustness of image classification algorithms.
We target attributes of the input images that are independent of the class identification, and manipulate those attributes to mimic real-world natural transformations.
We demonstrate the efficacy of our scheme by utilizing the disentangled latent representations derived from well-trained GANs.
arXiv Detail & Related papers (2021-05-10T01:56:03Z) - Self-Supervised Representation Learning from Flow Equivariance [97.13056332559526]
We present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes.
Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images.
arXiv Detail & Related papers (2021-01-16T23:44:09Z) - Encoding Robustness to Image Style via Adversarial Feature Perturbations [72.81911076841408]
We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce robust models.
Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training.
arXiv Detail & Related papers (2020-09-18T17:52:34Z) - Robustness to Transformations Across Categories: Is Robustness To
Transformations Driven by Invariant Neural Representations? [1.7251667223970861]
Deep Convolutional Neural Networks (DCNNs) have demonstrated impressive robustness to recognize objects under transformations.
A hypothesis to explain such robustness is that DCNNs develop invariant neural representations that remain unaltered when the image is transformed.
This paper investigates the conditions under which invariant neural representations emerge by leveraging that they facilitate robustness to transformations.
arXiv Detail & Related papers (2020-06-30T21:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.