Patch Gradient Descent: Training Neural Networks on Very Large Images
- URL: http://arxiv.org/abs/2301.13817v1
- Date: Tue, 31 Jan 2023 18:04:35 GMT
- Title: Patch Gradient Descent: Training Neural Networks on Very Large Images
- Authors: Deepak K. Gupta, Gowreesh Mago, Arnav Chavan, Dilip K. Prasad
- Abstract summary: We propose Patch Gradient Descent (PatchGD) to train existing CNN architectures on large-scale images.
PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image.
Our evaluation shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images.
- Score: 13.969180905165533
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Traditional CNN models are trained and tested on relatively low resolution
images (<300 px), and cannot be directly operated on large-scale images due to
compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an
effective learning strategy that allows to train the existing CNN architectures
on large-scale images in an end-to-end manner. PatchGD is based on the
hypothesis that instead of performing gradient-based updates on an entire image
at once, it should be possible to achieve a good solution by performing model
updates on only small parts of the image at a time, ensuring that the majority
of it is covered over the course of iterations. PatchGD thus extensively enjoys
better memory and compute efficiency when training models on large scale
images. PatchGD is thoroughly evaluated on two datasets - PANDA and UltraMNIST
with ResNet50 and MobileNetV2 models under different memory constraints. Our
evaluation clearly shows that PatchGD is much more stable and efficient than
the standard gradient-descent method in handling large images, and especially
when the compute memory is limited.
Related papers
- Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching [74.75248610868685]
Teddy is a Taylor-approximated dataset distillation framework designed to handle large-scale dataset.
Teddy attains state-of-the-art efficiency and performance on the Tiny-ImageNet and original-sized ImageNet-1K dataset.
arXiv Detail & Related papers (2024-10-10T03:28:46Z) - Adaptive Patching for High-resolution Image Segmentation with Transformers [9.525013089622183]
Attention-based models are proliferating in the space of image analytics, including segmentation.
Standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens.
For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model, if we are to use smaller patch sizes that are favorable in segmentation.
We take inspiration from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the images, as a pre-processing step, based
arXiv Detail & Related papers (2024-04-15T12:06:00Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - FlexiViT: One Model for All Patch Sizes [100.52574011880571]
Vision Transformers convert images to sequences by slicing them into patches.
The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost.
We show that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes.
arXiv Detail & Related papers (2022-12-15T18:18:38Z) - SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image.
Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z) - BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers [117.79456335844439]
We propose to use a semantic-rich visual tokenizer as the reconstruction target for masked prediction.
We then pretrain vision Transformers by predicting the original visual tokens for the masked image patches.
Experiments on image classification and semantic segmentation show that our approach outperforms all compared MIM methods.
arXiv Detail & Related papers (2022-08-12T16:48:10Z) - PatchDropout: Economizing Vision Transformers Using Patch Dropout [9.243684409949436]
We show that standard ViT models can be efficiently trained at high resolution by randomly dropping input image patches.
We observe a 5 times savings in computation and memory using PatchDropout, along with a boost in performance.
arXiv Detail & Related papers (2022-08-10T14:08:55Z) - Patch-Based Stochastic Attention for Image Editing [4.8201607588546]
We propose an efficient attention layer based on the algorithm PatchMatch, which is used for determining approximate nearest neighbors.
We demonstrate the usefulness of PSAL on several image editing tasks, such as image inpainting, guided image colorization, and single-image super-resolution.
arXiv Detail & Related papers (2022-02-07T13:42:00Z) - CNNs for JPEGs: A Study in Computational Cost [49.97673761305336]
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade.
CNNs are capable of learning robust representations of the data directly from the RGB pixels.
Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
arXiv Detail & Related papers (2020-12-26T15:00:10Z) - Memory-efficient GAN-based Domain Translation of High Resolution 3D
Medical Images [0.15092198588928965]
Generative adversarial networks (GANs) are rarely applied on 3D medical images of large size.
The present work proposes a multi-scale patch-based GAN approach for establishing unpaired domain translation.
The evaluation of the domain translation scenarios is performed on brain MRIs of size 155x240x240 and thorax CTs of size up to 512x512x512.
arXiv Detail & Related papers (2020-10-06T08:43:27Z) - Contextual Residual Aggregation for Ultra High-Resolution Image
Inpainting [12.839962012888199]
We propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents.
CRA mechanism produces high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches.
We train the proposed model on small images with resolutions 512x512 and perform inference on high-resolution images, achieving compelling inpainting quality.
arXiv Detail & Related papers (2020-05-19T18:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.