Learning to Downsample for Segmentation of Ultra-High Resolution Images
- URL: http://arxiv.org/abs/2109.11071v1
- Date: Wed, 22 Sep 2021 23:04:59 GMT
- Title: Learning to Downsample for Segmentation of Ultra-High Resolution Images
- Authors: Chen Jin, Ryutaro Tanno, Thomy Mertzanidou, Eleftheria Panagiotaki,
Daniel C. Alexander
- Abstract summary: We show that learning the spatially varying downsampling strategy jointly with segmentation offers advantages in segmenting large images with limited computational budget.
Our method adapts the sampling density over different locations so that more samples are collected from the small important regions and less from the others.
We show on two public and one local high-resolution datasets that our method consistently learns sampling locations preserving more information and boosting segmentation accuracy over baseline methods.
- Score: 6.432524678252553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Segmentation of ultra-high resolution images with deep learning is
challenging because of their enormous size, often millions or even billions of
pixels. Typical solutions drastically downsample the image uniformly to meet
memory constraints, implicitly assuming all pixels equally important by
sampling at the same density at all spatial locations. However this assumption
is not true and compromises the performance of deep learning techniques that
have proved powerful on standard-sized images. For example with uniform
downsampling, see green boxed region in Fig.1, the rider and bike do not have
enough corresponding samples while the trees and buildings are oversampled, and
lead to a negative effect on the segmentation prediction from the
low-resolution downsampled image. In this work we show that learning the
spatially varying downsampling strategy jointly with segmentation offers
advantages in segmenting large images with limited computational budget. Fig.1
shows that our method adapts the sampling density over different locations so
that more samples are collected from the small important regions and less from
the others, which in turn leads to better segmentation accuracy. We show on two
public and one local high-resolution datasets that our method consistently
learns sampling locations preserving more information and boosting segmentation
accuracy over baseline methods.
Related papers
- Towards Efficient and Accurate CT Segmentation via Edge-Preserving Probabilistic Downsampling [2.1465347972460367]
Downsampling images and labels, often necessitated by limited resources or to expedite network training, leads to the loss of small objects and thin boundaries.
This undermines the segmentation network's capacity to interpret images accurately and predict detailed labels, resulting in diminished performance compared to processing at original resolutions.
We introduce a novel method named Edge-preserving Probabilistic Downsampling (EPD)
It utilizes class uncertainty within a local window to produce soft labels, with the window size dictating the downsampling factor.
arXiv Detail & Related papers (2024-04-05T10:01:31Z) - On the Effect of Image Resolution on Semantic Segmentation [27.115235051091663]
We show that a model capable of directly producing high-resolution segmentations can match the performance of more complex systems.
Our approach leverages a bottom-up information propagation technique across various scales.
We have rigorously tested our method using leading-edge semantic segmentation datasets.
arXiv Detail & Related papers (2024-02-08T04:21:30Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [63.54342601757723]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - AutoFocusFormer: Image Segmentation off the Grid [11.257993284839621]
AutoFocusFormer (AFF) is a local-attention transformer image recognition backbone.
We develop a novel point-based local attention block, facilitated by a balanced clustering module.
Experiments show that our AutoFocusFormer (AFF) improves significantly over baseline models of similar sizes.
arXiv Detail & Related papers (2023-04-24T19:37:23Z) - Towards Effective Image Manipulation Detection with Proposal Contrastive
Learning [61.5469708038966]
We propose Proposal Contrastive Learning (PCL) for effective image manipulation detection.
Our PCL consists of a two-stream architecture by extracting two types of global features from RGB and noise views respectively.
Our PCL can be easily adapted to unlabeled data in practice, which can reduce manual labeling costs and promote more generalizable features.
arXiv Detail & Related papers (2022-10-16T13:30:13Z) - Any-resolution Training for High-resolution Image Synthesis [55.19874755679901]
Generative models operate at fixed resolution, even though natural images come in a variety of sizes.
We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions.
We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions.
arXiv Detail & Related papers (2022-04-14T17:59:31Z) - Image-free single-pixel segmentation [3.3808025405314086]
In this letter, we report an image-free single-pixel segmentation technique.
The technique combines structured illumination and single-pixel detection together, to efficiently samples and multiplexes scene's segmentation information.
We envision that this image-free segmentation technique can be widely applied in various resource-limited platforms such as UAV and unmanned vehicle.
arXiv Detail & Related papers (2021-08-24T10:06:53Z) - Semi-supervised Semantic Segmentation with Directional Context-aware
Consistency [66.49995436833667]
We focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images.
A preferred high-level representation should capture the contextual information while not losing self-awareness.
We present the Directional Contrastive Loss (DC Loss) to accomplish the consistency in a pixel-to-pixel manner.
arXiv Detail & Related papers (2021-06-27T03:42:40Z) - Superpixel Segmentation Based on Spatially Constrained Subspace
Clustering [57.76302397774641]
We consider each representative region with independent semantic information as a subspace, and formulate superpixel segmentation as a subspace clustering problem.
We show that a simple integration of superpixel segmentation with the conventional subspace clustering does not effectively work due to the spatial correlation of the pixels.
We propose a novel convex locality-constrained subspace clustering model that is able to constrain the spatial adjacent pixels with similar attributes to be clustered into a superpixel.
arXiv Detail & Related papers (2020-12-11T06:18:36Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.