Hard-Attention for Scalable Image Classification
- URL: http://arxiv.org/abs/2102.10212v1
- Date: Sat, 20 Feb 2021 00:21:28 GMT
- Title: Hard-Attention for Scalable Image Classification
- Authors: Athanasios Papadopoulos, Pawe{\l} Korus, Nasir Memon
- Abstract summary: We show that multi-scale hard-attention can be an effective solution to this problem.
We propose a novel architecture, TNet, which traverses an image pyramid in a top-down fashion.
We show that our model attends only to a fraction of the highest resolution content, while using only image-level labels without bounding boxes.
- Score: 16.8359205877213
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep neural networks (DNNs) are typically optimized for a specific input
resolution (e.g. $224 \times 224$ px) and their adoption to inputs of higher
resolution (e.g., satellite or medical images) remains challenging, as it leads
to excessive computation and memory overhead, and may require substantial
engineering effort (e.g., streaming). We show that multi-scale hard-attention
can be an effective solution to this problem. We propose a novel architecture,
TNet, which traverses an image pyramid in a top-down fashion, visiting only the
most informative regions along the way. We compare our model against strong
hard-attention baselines, achieving a better trade-off between resources and
accuracy on ImageNet. We further verify the efficacy of our model on satellite
images (fMoW dataset) of size up to $896 \times 896$ px. In addition, our
hard-attention mechanism guarantees predictions with a degree of
interpretability, without extra cost beyond inference. We also show that we can
reduce data acquisition and annotation cost, since our model attends only to a
fraction of the highest resolution content, while using only image-level labels
without bounding boxes.
Related papers
- $\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions [58.42011190989414]
We introduce a novel conditional diffusion model in infinite dimensions, $infty$-Brush for controllable large image synthesis.
To our best knowledge, $infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096times4096$ pixels.
arXiv Detail & Related papers (2024-07-20T00:04:49Z) - DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis [56.849285913695184]
Diffusion Mamba (DiM) is a sequence model for efficient high-resolution image synthesis.
DiM architecture achieves inference-time efficiency for high-resolution images.
Experiments demonstrate the effectiveness and efficiency of our DiM.
arXiv Detail & Related papers (2024-05-23T06:53:18Z) - xT: Nested Tokenization for Larger Context in Large Images [79.37673340393475]
xT is a framework for vision transformers which aggregates global context with local details.
We are able to increase accuracy by up to 8.6% on challenging classification tasks.
arXiv Detail & Related papers (2024-03-04T10:29:58Z) - Large-scale Weakly Supervised Learning for Road Extraction from
Satellite Imagery [9.28701721082481]
This paper proposes to leverage OpenStreetMap road data as weak labels and large scale satellite imagery to pre-train semantic segmentation models.
Using as much as 100 times more data than the widely used DeepGlobe road dataset, our model exceeds the top performer of the current DeepGlobe leaderboard.
arXiv Detail & Related papers (2023-09-14T16:16:57Z) - ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of
Zoom and Spatial Biases in Image Classification [9.779748872936912]
We show that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images.
We propose a test-time augmentation (TTA) technique that improves classification accuracy by forcing models to explicitly perform zoom-in operations.
arXiv Detail & Related papers (2023-04-11T23:55:50Z) - SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image.
Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z) - Unsupervised Super-Resolution of Satellite Imagery for High Fidelity
Material Label Transfer [78.24493844353258]
We propose an unsupervised domain adaptation based approach using adversarial learning.
We aim to harvest information from smaller quantities of high resolution data (source domain) and utilize the same to super-resolve low resolution imagery (target domain)
arXiv Detail & Related papers (2021-05-16T00:57:43Z) - Efficient Poverty Mapping using Deep Reinforcement Learning [75.6332944247741]
High-resolution satellite imagery and machine learning have proven useful in many sustainability-related tasks.
The accuracy afforded by high-resolution imagery comes at a cost, as such imagery is extremely expensive to purchase at scale.
We propose a reinforcement learning approach in which free low-resolution imagery is used to dynamically identify where to acquire costly high-resolution images.
arXiv Detail & Related papers (2020-06-07T18:30:57Z) - Contextual Residual Aggregation for Ultra High-Resolution Image
Inpainting [12.839962012888199]
We propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents.
CRA mechanism produces high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches.
We train the proposed model on small images with resolutions 512x512 and perform inference on high-resolution images, achieving compelling inpainting quality.
arXiv Detail & Related papers (2020-05-19T18:55:32Z) - Learning When and Where to Zoom with Deep Reinforcement Learning [101.79271767464947]
We propose a reinforcement learning approach to identify when and where to use/acquire high resolution data conditioned on paired, cheap, low resolution images.
We conduct experiments on CIFAR10, CIFAR100, ImageNet and fMoW datasets where we use significantly less high resolution data while maintaining similar accuracy to models which use full high resolution images.
arXiv Detail & Related papers (2020-03-01T07:16:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.