Related papers: Patch Gradient Descent: Training Neural Networks on Very Large Images

Patch Gradient Descent: Training Neural Networks on Very Large Images

URL: http://arxiv.org/abs/2301.13817v1
Date: Tue, 31 Jan 2023 18:04:35 GMT
Title: Patch Gradient Descent: Training Neural Networks on Very Large Images
Authors: Deepak K. Gupta, Gowreesh Mago, Arnav Chavan, Dilip K. Prasad
Abstract summary: We propose Patch Gradient Descent (PatchGD) to train existing CNN architectures on large-scale images. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image. Our evaluation shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images.
Score: 13.969180905165533
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Traditional CNN models are trained and tested on relatively low resolution images (<300 px), and cannot be directly operated on large-scale images due to compute and memory constraints. We propose Patch Gradient Descent (PatchGD), an effective learning strategy that allows to train the existing CNN architectures on large-scale images in an end-to-end manner. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image at a time, ensuring that the majority of it is covered over the course of iterations. PatchGD thus extensively enjoys better memory and compute efficiency when training models on large scale images. PatchGD is thoroughly evaluated on two datasets - PANDA and UltraMNIST with ResNet50 and MobileNetV2 models under different memory constraints. Our evaluation clearly shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images, and especially when the compute memory is limited.

Related papers

Next Patch Prediction for Autoregressive Visual Generation [58.73461205369825]
We extend the Next Token Prediction (NTP) paradigm to a novel Next Patch Prediction (NPP) paradigm. Our key idea is to group and aggregate image tokens into patch tokens with higher information density. We show that NPP could reduce the training cost to around 0.6 times while improving image generation quality by up to 1.0 FID score on the ImageNet 256x256 generation benchmark.
arXiv Detail & Related papers (2024-12-19T18:59:36Z)
Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching [74.75248610868685]
Teddy is a Taylor-approximated dataset distillation framework designed to handle large-scale dataset. Teddy attains state-of-the-art efficiency and performance on the Tiny-ImageNet and original-sized ImageNet-1K dataset.
arXiv Detail & Related papers (2024-10-10T03:28:46Z)
Adaptive Patching for High-resolution Image Segmentation with Transformers [9.525013089622183]
Attention-based models are proliferating in the space of image analytics, including segmentation. Standard method of feeding images to transformer encoders is to divide the images into patches and then feed the patches to the model as a linear sequence of tokens. For high-resolution images, e.g. microscopic pathology images, the quadratic compute and memory cost prohibits the use of an attention-based model, if we are to use smaller patch sizes that are favorable in segmentation. We take inspiration from Adapative Mesh Refinement (AMR) methods in HPC by adaptively patching the images, as a pre-processing step, based
arXiv Detail & Related papers (2024-04-15T12:06:00Z)
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z)
FlexiViT: One Model for All Patch Sizes [100.52574011880571]
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost. We show that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes.
arXiv Detail & Related papers (2022-12-15T18:18:38Z)
SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image. Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z)
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers [117.79456335844439]
We propose to use a semantic-rich visual tokenizer as the reconstruction target for masked prediction. We then pretrain vision Transformers by predicting the original visual tokens for the masked image patches. Experiments on image classification and semantic segmentation show that our approach outperforms all compared MIM methods.
arXiv Detail & Related papers (2022-08-12T16:48:10Z)
PatchDropout: Economizing Vision Transformers Using Patch Dropout [9.243684409949436]
We show that standard ViT models can be efficiently trained at high resolution by randomly dropping input image patches. We observe a 5 times savings in computation and memory using PatchDropout, along with a boost in performance.
arXiv Detail & Related papers (2022-08-10T14:08:55Z)
Patch-Based Stochastic Attention for Image Editing [4.8201607588546]
We propose an efficient attention layer based on the algorithm PatchMatch, which is used for determining approximate nearest neighbors. We demonstrate the usefulness of PSAL on several image editing tasks, such as image inpainting, guided image colorization, and single-image super-resolution.
arXiv Detail & Related papers (2022-02-07T13:42:00Z)
CNNs for JPEGs: A Study in Computational Cost [49.97673761305336]
Convolutional neural networks (CNNs) have achieved astonishing advances over the past decade. CNNs are capable of learning robust representations of the data directly from the RGB pixels. Deep learning methods capable of learning directly from the compressed domain have been gaining attention in recent years.
arXiv Detail & Related papers (2020-12-26T15:00:10Z)
Memory-efficient GAN-based Domain Translation of High Resolution 3D Medical Images [0.15092198588928965]
Generative adversarial networks (GANs) are rarely applied on 3D medical images of large size. The present work proposes a multi-scale patch-based GAN approach for establishing unpaired domain translation. The evaluation of the domain translation scenarios is performed on brain MRIs of size 155x240x240 and thorax CTs of size up to 512x512x512.
arXiv Detail & Related papers (2020-10-06T08:43:27Z)
Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting [12.839962012888199]
We propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents. CRA mechanism produces high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches. We train the proposed model on small images with resolutions 512x512 and perform inference on high-resolution images, achieving compelling inpainting quality.
arXiv Detail & Related papers (2020-05-19T18:55:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.