On the Generalizability of Iterative Patch Selection for Memory-Efficient High-Resolution Image Classification
- URL: http://arxiv.org/abs/2412.11237v1
- Date: Sun, 15 Dec 2024 16:25:30 GMT
- Title: On the Generalizability of Iterative Patch Selection for Memory-Efficient High-Resolution Image Classification
- Authors: Max Riffi-Aslett, Christina Fell,
- Abstract summary: Classifying large images with small or tiny regions of interest is challenging due to computational and memory constraints.<n>We explore these issues using a novel testbed on a memory-efficient cross-attention transformer with Iterative Patch Selection (IPS) as the patch selection module.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Classifying large images with small or tiny regions of interest (ROI) is challenging due to computational and memory constraints. Weakly supervised memory-efficient patch selectors have achieved results comparable with strongly supervised methods. However, low signal-to-noise ratios and low entropy attention still cause overfitting. We explore these issues using a novel testbed on a memory-efficient cross-attention transformer with Iterative Patch Selection (IPS) as the patch selection module. Our testbed extends the megapixel MNIST benchmark to four smaller O2I (object-to-image) ratios ranging from 0.01% to 0.14% while keeping the canvas size fixed and introducing a noise generation component based on B\'ezier curves. Experimental results generalize the observations made on CNNs to IPS whereby the O2I threshold below which the classifier fails to generalize is affected by the training dataset size. We further observe that the magnitude of this interaction differs for each task of the Megapixel MNIST. For tasks "Maj" and "Top", the rate is at its highest, followed by tasks "Max" and "Multi" where in the latter, this rate is almost at 0. Moreover, results show that in a low data setting, tuning the patch size to be smaller relative to the ROI improves generalization, resulting in an improvement of + 15% for the megapixel MNIST and + 5% for the Swedish traffic signs dataset compared to the original object-to-patch ratios in IPS. Further outcomes indicate that the similarity between the thickness of the noise component and the digits in the megapixel MNIST gradually causes IPS to fail to generalize, contributing to previous suspicions.
Related papers
- How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings [106.3726679697804]
We compare the two most common techniques for mitigating this spectral bias: Fourier feature encodings (FFE) and multigrid parametric encodings (MPE)
MPEs are seen as the standard for low dimensional mappings, but MPEs often outperform them and learn representations with higher resolution and finer detail.
We prove that MPEs improve a network's performance through the structure of their grid and not their learnable embedding.
arXiv Detail & Related papers (2025-04-18T02:18:08Z) - Exploring Kernel Transformations for Implicit Neural Representations [57.2225355625268]
Implicit neural representations (INRs) leverage neural networks to represent signals by mapping coordinates to their corresponding attributes.
This work pioneers the exploration of the effect of kernel transformation of input/output while keeping the model itself unchanged.
A byproduct of our findings is a simple yet effective method that combines scale and shift to significantly boost INR with negligible overhead.
arXiv Detail & Related papers (2025-04-07T04:43:50Z) - SparseFormer: Detecting Objects in HRW Shots via Sparse Vision Transformer [62.11796778482088]
We present a novel model-agnostic sparse vision transformer, dubbed SparseFormer, to bridge the gap of object detection between close-up and HRW shots.
The proposed SparseFormer selectively uses attentive tokens to scrutinize the sparsely distributed windows that may contain objects.
experiments on two HRW benchmarks, PANDA and DOTA-v1.0, demonstrate that the proposed SparseFormer significantly improves detection accuracy (up to 5.8%) and speed (up to 3x) over the state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-11T03:21:25Z) - Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss [59.835032408496545]
We propose a tile-based strategy that partitions the contrastive loss calculation into arbitrary small blocks.
We also introduce a multi-level tiling strategy to leverage the hierarchical structure of distributed systems.
Compared to SOTA memory-efficient solutions, it achieves a two-order-of-magnitude reduction in memory while maintaining comparable speed.
arXiv Detail & Related papers (2024-10-22T17:59:30Z) - Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP)
Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid.
PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z) - Towards Efficient and Accurate CT Segmentation via Edge-Preserving Probabilistic Downsampling [2.1465347972460367]
Downsampling images and labels, often necessitated by limited resources or to expedite network training, leads to the loss of small objects and thin boundaries.
This undermines the segmentation network's capacity to interpret images accurately and predict detailed labels, resulting in diminished performance compared to processing at original resolutions.
We introduce a novel method named Edge-preserving Probabilistic Downsampling (EPD)
It utilizes class uncertainty within a local window to produce soft labels, with the window size dictating the downsampling factor.
arXiv Detail & Related papers (2024-04-05T10:01:31Z) - Learning county from pixels: Corn yield prediction with attention-weighted multiple instance learning [8.573309028586168]
This research examines each county at the pixel level and applies multiple instance learning to leverage detailed information within a county.
In addition, our method addresses the "mixed pixel" issue caused by the inconsistent resolution between feature datasets and crop mask.
The developed model outperforms four other machine learning models over the past five years in the U.S. corn belt.
arXiv Detail & Related papers (2023-12-02T02:09:31Z) - CoFiI2P: Coarse-to-Fine Correspondences for Image-to-Point Cloud Registration [9.57539651520755]
CoFiI2P is a novel I2P registration network that extracts correspondences in a coarse-to-fine manner.
In the coarse matching phase, a novel I2P transformer module is employed to capture both homogeneous and heterogeneous global information.
In the fine matching module, point/pixel pairs are established with the guidance of super-point/super-pixel correspondences.
arXiv Detail & Related papers (2023-09-26T04:32:38Z) - Hierarchical Transformer for Survival Prediction Using Multimodality
Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical.
This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes.
Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z) - {\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images.
We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context.
We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z) - New wrapper method based on normalized mutual information for dimension
reduction and classification of hyperspectral images [0.0]
We propose a new wrapper method based on normalized mutual information (NMI) and error probability (PE)
Experiments have been performed on two challenging hyperspectral benchmarks datasets captured by the NASA's Airborne Visible/Infrared Imaging Spectrometer Sensor (AVIRIS)
arXiv Detail & Related papers (2022-10-25T21:17:11Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - Decision-based Black-box Attack Against Vision Transformers via
Patch-wise Adversarial Removal [42.032749850729246]
We propose a new decision-based black-box attack against ViTs termed Patch-wise Adrial Removal (PAR)
PAR records the noise magnitude and noise sensitivity of each patch and selects the patch with the highest query value for noise compression.
Experiments on ImageNet-21k, ILSVRC-2012, and Tiny-Imagenet datasets demonstrate that PAR achieves a much lower magnitude of perturbation on average with the same number of queries.
arXiv Detail & Related papers (2021-12-07T04:46:13Z) - Efficient Classification of Very Large Images with Tiny Objects [15.822654320750054]
We present an end-to-end CNN model termed Zoom-In network for classification of large images with tiny objects.
We evaluate our method on two large-image datasets and one gigapixel dataset.
arXiv Detail & Related papers (2021-06-04T20:13:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.