SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images
- URL: http://arxiv.org/abs/2407.17956v1
- Date: Thu, 25 Jul 2024 11:22:54 GMT
- Title: SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images
- Authors: Wenxi Li, Ruxin Zhang, Haozhe Lin, Yuchen Guo, Chao Ma, Xiaokang Yang,
- Abstract summary: 'SaccadeDet' is an innovative architecture for gigapixel-level object detection, inspired by the human eye saccadic movement.
Our approach, evaluated on the PANDA dataset, achieves an 8x speed increase over the state-of-the-art methods.
It also demonstrates significant potential in gigapixel-level pathology analysis through its application to Whole Slide Imaging.
- Score: 50.742420049839474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The advancement of deep learning in object detection has predominantly focused on megapixel images, leaving a critical gap in the efficient processing of gigapixel images. These super high-resolution images present unique challenges due to their immense size and computational demands. To address this, we introduce 'SaccadeDet', an innovative architecture for gigapixel-level object detection, inspired by the human eye saccadic movement. The cornerstone of SaccadeDet is its ability to strategically select and process image regions, dramatically reducing computational load. This is achieved through a two-stage process: the 'saccade' stage, which identifies regions of probable interest, and the 'gaze' stage, which refines detection in these targeted areas. Our approach, evaluated on the PANDA dataset, not only achieves an 8x speed increase over the state-of-the-art methods but also demonstrates significant potential in gigapixel-level pathology analysis through its application to Whole Slide Imaging.
Related papers
- Ground-based image deconvolution with Swin Transformer UNet [2.41675832913699]
We introduce a two-step deconvolution framework using a Swin Transformer architecture.
Our study reveals that the deep learning-based solution introduces a bias, constraining the scope of scientific analysis.
We propose a novel third step relying on the active coefficients in the sparsity wavelet framework.
arXiv Detail & Related papers (2024-05-13T15:30:41Z) - Resource Efficient Perception for Vision Systems [0.0]
Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images.
It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content.
We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation.
arXiv Detail & Related papers (2024-05-12T05:33:00Z) - Neural Network-Based Processing and Reconstruction of Compromised Biophotonic Image Data [0.12427543342032196]
The integration of deep learning techniques with biophotonic setups has opened new horizons in bioimaging.
This article provides an in-depth review of the diverse measurement aspects that researchers intentionally impair in their biophotonic setups.
We discuss various biophotonic methods that have successfully employed this strategic approach.
arXiv Detail & Related papers (2024-03-21T11:44:25Z) - Memory-Constrained Semantic Segmentation for Ultra-High Resolution UAV
Imagery [35.96063342025938]
This paper explores the intricate problem of achieving efficient and effective segmentation of ultra-high resolution UAV imagery.
We propose a GPU memory-efficient and effective framework for local inference without accessing the context beyond local patches.
We present an efficient memory-based interaction scheme to correct potential semantic bias of the underlying high-resolution information.
arXiv Detail & Related papers (2023-10-07T07:44:59Z) - Accurate Gigapixel Crowd Counting by Iterative Zooming and Refinement [90.76576712433595]
GigaZoom iteratively zooms into the densest areas of the image and refines coarser density maps with finer details.
We show that GigaZoom obtains the state-of-the-art for gigapixel crowd counting and improves the accuracy of the next best method by 42%.
arXiv Detail & Related papers (2023-05-16T08:25:27Z) - High Dynamic Range and Super-Resolution from Raw Image Bursts [52.341483902624006]
This paper introduces the first approach to reconstruct high-resolution, high-dynamic range color images from raw photographic bursts captured by a handheld camera with exposure bracketing.
The proposed algorithm is fast, with low memory requirements compared to state-of-the-art learning-based approaches to image restoration.
Experiments demonstrate its excellent performance with super-resolution factors of up to $times 4$ on real photographs taken in the wild with hand-held cameras.
arXiv Detail & Related papers (2022-07-29T13:31:28Z) - Generating Superpixels for High-resolution Images with Decoupled Patch
Calibration [82.21559299694555]
Patch Networks (PCNet) is designed to efficiently and accurately implement high-resolution superpixel segmentation.
DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries.
In particular, DPC takes a local patch from the high-resolution images and dynamically generates a binary mask to impose the network to focus on region boundaries.
arXiv Detail & Related papers (2021-08-19T10:33:05Z) - You Better Look Twice: a new perspective for designing accurate
detectors with reduced computations [56.34005280792013]
BLT-net is a new low-computation two-stage object detection architecture.
It reduces computations by separating objects from background using a very lite first-stage.
Resulting image proposals are then processed in the second-stage by a highly accurate model.
arXiv Detail & Related papers (2021-07-21T12:39:51Z) - Exploiting Raw Images for Real-Scene Super-Resolution [105.18021110372133]
We study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
We propose a method to generate more realistic training data by mimicking the imaging process of digital cameras.
We also develop a two-branch convolutional neural network to exploit the radiance information originally-recorded in raw images.
arXiv Detail & Related papers (2021-02-02T16:10:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.