Iterative Patch Selection for High-Resolution Image Recognition
- URL: http://arxiv.org/abs/2210.13007v1
- Date: Mon, 24 Oct 2022 07:55:57 GMT
- Title: Iterative Patch Selection for High-Resolution Image Recognition
- Authors: Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
- Abstract summary: We propose a simple method, Iterative Patch Selection (IPS), which decouples the memory usage from the input size.
IPS achieves this by selecting only the most salient patches, which are then aggregated into a global representation for image recognition.
Our method demonstrates strong performance and has wide applicability across different domains, training regimes and image sizes while using minimal accelerator memory.
- Score: 10.847032625429717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-resolution images are prevalent in various applications, such as
autonomous driving and computer-aided diagnosis. However, training neural
networks on such images is computationally challenging and easily leads to
out-of-memory errors even on modern GPUs. We propose a simple method, Iterative
Patch Selection (IPS), which decouples the memory usage from the input size and
thus enables the processing of arbitrarily large images under tight hardware
constraints. IPS achieves this by selecting only the most salient patches,
which are then aggregated into a global representation for image recognition.
For both patch selection and aggregation, a cross-attention based transformer
is introduced, which exhibits a close connection to Multiple Instance Learning.
Our method demonstrates strong performance and has wide applicability across
different domains, training regimes and image sizes while using minimal
accelerator memory. For example, we are able to finetune our model on
whole-slide images consisting of up to 250k patches (>16 gigapixels) with only
5 GB of GPU VRAM at a batch size of 16.
Related papers
- Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP)
Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid.
PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z) - CoordFill: Efficient High-Resolution Image Inpainting via Parameterized
Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation.
Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z) - PatchDropout: Economizing Vision Transformers Using Patch Dropout [9.243684409949436]
We show that standard ViT models can be efficiently trained at high resolution by randomly dropping input image patches.
We observe a 5 times savings in computation and memory using PatchDropout, along with a boost in performance.
arXiv Detail & Related papers (2022-08-10T14:08:55Z) - ImageSig: A signature transform for ultra-lightweight image recognition [0.0]
ImageSig is based on computing signatures and does not require a convolutional structure or an attention-based encoder.
ImageSig shows unprecedented performance on hardware such as Raspberry Pi and Jetson-nano.
arXiv Detail & Related papers (2022-05-13T23:48:32Z) - HIPA: Hierarchical Patch Transformer for Single Image Super Resolution [62.7081074931892]
This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition.
We build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution.
Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions.
arXiv Detail & Related papers (2022-03-19T05:09:34Z) - Patch-Based Stochastic Attention for Image Editing [4.8201607588546]
We propose an efficient attention layer based on the algorithm PatchMatch, which is used for determining approximate nearest neighbors.
We demonstrate the usefulness of PSAL on several image editing tasks, such as image inpainting, guided image colorization, and single-image super-resolution.
arXiv Detail & Related papers (2022-02-07T13:42:00Z) - Parallel Discrete Convolutions on Adaptive Particle Representations of
Images [2.362412515574206]
We present data structures and algorithms for native implementations of discrete convolution operators over Adaptive Particle Representations.
The APR is a content-adaptive image representation that locally adapts the sampling resolution to the image signal.
We show that APR convolution naturally leads to scale-adaptive algorithms that efficiently parallelize on multi-core CPU and GPU architectures.
arXiv Detail & Related papers (2021-12-07T09:40:05Z) - Generating Superpixels for High-resolution Images with Decoupled Patch
Calibration [82.21559299694555]
Patch Networks (PCNet) is designed to efficiently and accurately implement high-resolution superpixel segmentation.
DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries.
In particular, DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries.
arXiv Detail & Related papers (2021-08-19T10:33:05Z) - Memory Efficient Meta-Learning with Large Images [62.70515410249566]
Meta learning approaches to few-shot classification are computationally efficient at test time requiring just a few optimization steps or single forward pass to learn a new task.
This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken.
We propose LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU.
arXiv Detail & Related papers (2021-07-02T14:37:13Z) - CooGAN: A Memory-Efficient Framework for High-Resolution Facial
Attribute Editing [84.92009553462384]
We propose a NOVEL pixel translation framework called Cooperative GAN(CooGAN) for HR facial image editing.
This framework features a local path for fine-grained local facial patch generation (i.e., patch-level HR, LOW memory) and a global path for global lowresolution (LR) facial structure monitoring (i.e., image-level LR, LOW memory)
In addition, we propose a lighter selective transfer unit for more efficient multi-scale features fusion, yielding higher fidelity facial attributes manipulation.
arXiv Detail & Related papers (2020-11-03T08:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.