Related papers: Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

URL: http://arxiv.org/abs/2201.12961v1
Date: Mon, 31 Jan 2022 02:12:45 GMT
Title: Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations
Authors: Amin Ghiasi, Hamid Kazemi, Steven Reich, Chen Zhu, Micah Goldblum, Tom Goldstein
Abstract summary: We introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper- parameter tuning. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset.
Score: 61.95114821573875
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Existing techniques for model inversion typically rely on hard-to-tune regularizers, such as total variation or feature regularization, which must be individually calibrated for each network in order to produce adequate images. In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. Under our proposed augmentation-based scheme, the same set of augmentation hyper-parameters can be used for inverting a wide range of image classification models, regardless of input dimensions or the architecture. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset, tasks which to the best of our knowledge have not been successfully accomplished by any previous works.

Related papers

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z)
TiC: Exploring Vision Transformer in Convolution [37.50285921899263]
We propose the Multi-Head Self-Attention Convolution (MSA-Conv) MSA-Conv incorporates Self-Attention within generalized convolutions, including standard, dilated, and depthwise ones. We present the Vision Transformer in Convolution (TiC) as a proof of concept for image classification with MSA-Conv.
arXiv Detail & Related papers (2023-10-06T10:16:26Z)
A Contrastive Learning Scheme with Transformer Innate Patches [4.588028371034407]
We present Contrastive Transformer, a contrastive learning scheme using the Transformer innate patches. The scheme performs supervised patch-level contrastive learning, selecting the patches based on the ground truth mask. The scheme applies to all vision-transformer architectures, is easy to implement, and introduces minimal additional memory footprint.
arXiv Detail & Related papers (2023-03-26T20:19:28Z)
A Simple Plugin for Transforming Images to Arbitrary Scales [47.36233857830832]
We develop a general plugin that can be inserted into existing super-resolution models, conveniently augmenting their ability towards Arbitrary Resolution Image Scaling. We show that the resulting models can not only maintain their original performance on fixed scale factor but also extrapolate to unseen scales, substantially outperforming existing any-scale super-resolution models on standard benchmarks.
arXiv Detail & Related papers (2022-10-07T09:24:38Z)
Patch Similarity Aware Data-Free Quantization for Vision Transformers [2.954890575035673]
We propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers. We analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images. Experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT.
arXiv Detail & Related papers (2022-03-04T11:47:20Z)
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE. ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context. We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z)
Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
Visual Saliency Transformer [127.33678448761599]
We develop a novel unified model based on a pure transformer, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD) It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Experimental results show that our model outperforms existing state-of-the-art results on both RGB and RGB-D SOD benchmark datasets.
arXiv Detail & Related papers (2021-04-25T08:24:06Z)
Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT) We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs. IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.