Masked strategies for images with small objects
- URL: http://arxiv.org/abs/2504.17935v1
- Date: Thu, 24 Apr 2025 20:52:23 GMT
- Title: Masked strategies for images with small objects
- Authors: H. Martin Gillis, Ming Hill, Paul Hollensen, Alan Fine, Thomas Trappenberg,
- Abstract summary: hematology analytics used for detection and classification of small blood components is a significant challenge.<n>Deep learning approaches using supervised models with pre-trained weights have demonstrated success for many applications.<n>However, when applied to images outside the domain of learned representations, these methods often result with less than acceptable performance.
- Score: 1.0485739694839669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The hematology analytics used for detection and classification of small blood components is a significant challenge. In particular, when objects exists as small pixel-sized entities in a large context of similar objects. Deep learning approaches using supervised models with pre-trained weights, such as residual networks and vision transformers have demonstrated success for many applications. Unfortunately, when applied to images outside the domain of learned representations, these methods often result with less than acceptable performance. A strategy to overcome this can be achieved by using self-supervised models, where representations are learned and weights are then applied for downstream applications. Recently, masked autoencoders have proven to be effective to obtain representations that captures global context information. By masking regions of an image and having the model learn to reconstruct both the masked and non-masked regions, weights can be used for various applications. However, if the sizes of the objects in images are less than the size of the mask, the global context information is lost, making it almost impossible to reconstruct the image. In this study, we investigated the effect of mask ratios and patch sizes for blood components using a MAE to obtain learned ViT encoder representations. We then applied the encoder weights to train a U-Net Transformer for semantic segmentation to obtain both local and global contextual information. Our experimental results demonstrates that both smaller mask ratios and patch sizes improve the reconstruction of images using a MAE. We also show the results of semantic segmentation with and without pre-trained weights, where smaller-sized blood components benefited with pre-training. Overall, our proposed method offers an efficient and effective strategy for the segmentation and classification of small objects.
Related papers
- From Pixels to Components: Eigenvector Masking for Visual Representation Learning [55.567395509598065]
Predicting masked from visible parts of an image is a powerful self-supervised approach for visual representation learning.<n>We propose an alternative masking strategy that operates on a suitable transformation of the data rather than on the raw pixels.
arXiv Detail & Related papers (2025-02-10T10:06:46Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding [38.53988682814626]
We propose a context-enhanced masked image modeling method (CtxMIM) for remote sensing image understanding.
CtxMIM formulates original image patches as a reconstructive template and employs a Siamese framework to operate on two sets of image patches.
With the simple and elegant design, CtxMIM encourages the pre-training model to learn object-level or pixel-level features on a large-scale dataset.
arXiv Detail & Related papers (2023-09-28T18:04:43Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Interpretable Small Training Set Image Segmentation Network Originated
from Multi-Grid Variational Model [5.283735137946097]
Deep learning (DL) methods have been proposed and widely used for image segmentation.
DL methods usually require a large amount of manually segmented data as training data and suffer from poor interpretability.
In this paper, we replace the hand-crafted regularity term in the MS model with a data adaptive generalized learnable regularity term.
arXiv Detail & Related papers (2023-06-25T02:34:34Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Improving Masked Autoencoders by Learning Where to Mask [65.89510231743692]
Masked image modeling is a promising self-supervised learning method for visual data.
We present AutoMAE, a framework that uses Gumbel-Softmax to interlink an adversarially-trained mask generator and a mask-guided image modeling process.
In our experiments, AutoMAE is shown to provide effective pretraining models on standard self-supervised benchmarks and downstream tasks.
arXiv Detail & Related papers (2023-03-12T05:28:55Z) - Object-wise Masked Autoencoders for Fast Pre-training [13.757095663704858]
We show that current masked image encoding models learn the underlying relationship between all objects in the whole scene, instead of a single object representation.
We introduce a novel object selection and division strategy to drop non-object patches for learning object-wise representations by selective reconstruction with interested region masks.
Experiments on four commonly-used datasets demonstrate the effectiveness of our model in reducing the compute cost by 72% while achieving competitive performance.
arXiv Detail & Related papers (2022-05-28T05:13:45Z) - Meta Corrupted Pixels Mining for Medical Image Segmentation [30.140008860735062]
In medical image segmentation, it is very laborious and expensive to acquire precise pixel-level annotations.
We propose a novel Meta Corrupted Pixels Mining (MCPM) method based on a simple meta mask network.
Our method is targeted at automatically estimate a weighting map to evaluate the importance of every pixel in the learning of segmentation network.
arXiv Detail & Related papers (2020-07-07T15:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.