Related papers: Efficient Classification of Very Large Images with Tiny Objects

Efficient Classification of Very Large Images with Tiny Objects

URL: http://arxiv.org/abs/2106.02694v1
Date: Fri, 4 Jun 2021 20:13:04 GMT
Title: Efficient Classification of Very Large Images with Tiny Objects
Authors: Fanjie Kong, Ricardo Henao
Abstract summary: We present an end-to-end CNN model termed Zoom-In network for classification of large images with tiny objects. We evaluate our method on two large-image datasets and one gigapixel dataset.
Score: 15.822654320750054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An increasing number of applications in the computer vision domain, specially, in medical imaging and remote sensing, are challenging when the goal is to classify very large images with tiny objects. More specifically, these type of classification tasks face two key challenges: $i$) the size of the input image in the target dataset is usually in the order of megapixels, however, existing deep architectures do not easily operate on such big images due to memory constraints, consequently, we seek a memory-efficient method to process these images; and $ii$) only a small fraction of the input images are informative of the label of interest, resulting in low region of interest (ROI) to image ratio. However, most of the current convolutional neural networks (CNNs) are designed for image classification datasets that have relatively large ROIs and small image size (sub-megapixel). Existing approaches have addressed these two challenges in isolation. We present an end-to-end CNN model termed Zoom-In network that leverages hierarchical attention sampling for classification of large images with tiny objects using a single GPU. We evaluate our method on two large-image datasets and one gigapixel dataset. Experimental results show that our model achieves higher accuracy than existing methods while requiring less computing resources.

Related papers

VisionGRU: A Linear-Complexity RNN Model for Efficient Image Analysis [8.10783983193165]
Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are two dominant models for image analysis. This paper introduces VisionGRU, a novel RNN-based architecture designed for efficient image classification.
arXiv Detail & Related papers (2024-12-24T05:27:11Z)
Parameter-Inverted Image Pyramid Networks [49.35689698870247]
We propose a novel network architecture known as the Inverted Image Pyramid Networks (PIIP) Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid. PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
xT: Nested Tokenization for Larger Context in Large Images [79.37673340393475]
xT is a framework for vision transformers which aggregates global context with local details. We are able to increase accuracy by up to 8.6% on challenging classification tasks.
arXiv Detail & Related papers (2024-03-04T10:29:58Z)
T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields. In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting. Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z)
AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance. We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations. AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z)
Leveraging Image Complexity in Macro-Level Neural Network Design for Medical Image Segmentation [3.974175960216864]
We show that image complexity can be used as a guideline in choosing what is best for a given dataset. For high-complexity datasets, a shallow network running on the original images may yield better segmentation results than a deep network running on downsampled images.
arXiv Detail & Related papers (2021-12-21T09:49:47Z)
You Better Look Twice: a new perspective for designing accurate detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture. It reduces computations by separating objects from background using a very lite first-stage. Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z)
Gigapixel Histopathological Image Analysis using Attention-based Neural Networks [7.1715252990097325]
We propose a CNN structure consisting of a compressing path and a learning path. Our method integrates both global and local information, is flexible with regard to the size of the input images and only requires weak image-level labels.
arXiv Detail & Related papers (2021-01-25T10:18:52Z)
An Evolution of CNN Object Classifiers on Low-Resolution Images [0.4129225533930965]
Object classification from low-quality images is difficult for the variance of object colors, aspect ratios, and cluttered backgrounds. Deep convolutional neural networks (DCNNs) have been demonstrated as very powerful systems for facing the challenge of object classification from high-resolution images. In this paper, we investigate an optimal architecture that accurately classifies low-quality images using DCNNs architectures.
arXiv Detail & Related papers (2021-01-03T18:44:23Z)
CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images. With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images. Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.