An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation
- URL: http://arxiv.org/abs/2409.09530v1
- Date: Sat, 14 Sep 2024 21:01:49 GMT
- Title: An Augmentation-based Model Re-adaptation Framework for Robust Image Segmentation
- Authors: Zheming Zuo, Joseph Smith, Jonathan Stonehouse, Boguslaw Obara,
- Abstract summary: We propose an Augmentation-based Model Re-adaptation Framework (AMRF) to enhance the generalisation of segmentation models.
By observing segmentation masks from conventional models (FCN and U-Net) and a pre-trained SAM model, we determine a minimal augmentation set that optimally balances training efficiency and model performance.
Our results demonstrate that the fine-tuned FCN surpasses its baseline by 3.29% and 3.02% in cropping accuracy, and 5.27% and 4.04% in classification accuracy on two temporally continuous datasets.
- Score: 0.799543372823325
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image segmentation is a crucial task in computer vision, with wide-ranging applications in industry. The Segment Anything Model (SAM) has recently attracted intensive attention; however, its application in industrial inspection, particularly for segmenting commercial anti-counterfeit codes, remains challenging. Unlike open-source datasets, industrial settings often face issues such as small sample sizes and complex textures. Additionally, computational cost is a key concern due to the varying number of trainable parameters. To address these challenges, we propose an Augmentation-based Model Re-adaptation Framework (AMRF). This framework leverages data augmentation techniques during training to enhance the generalisation of segmentation models, allowing them to adapt to newly released datasets with temporal disparity. By observing segmentation masks from conventional models (FCN and U-Net) and a pre-trained SAM model, we determine a minimal augmentation set that optimally balances training efficiency and model performance. Our results demonstrate that the fine-tuned FCN surpasses its baseline by 3.29% and 3.02% in cropping accuracy, and 5.27% and 4.04% in classification accuracy on two temporally continuous datasets. Similarly, the fine-tuned U-Net improves upon its baseline by 7.34% and 4.94% in cropping, and 8.02% and 5.52% in classification. Both models outperform the top-performing SAM models (ViT-Large and ViT-Base) by an average of 11.75% and 9.01% in cropping accuracy, and 2.93% and 4.83% in classification accuracy, respectively.
Related papers
- Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes [12.36950265154199]
We introduce a novel Multi-scale Contrastive Adaptor learning method named MCA-SAM.
MCA-SAM enhances adaptor performance through a meticulously designed contrastive learning framework at both token and sample levels.
Empirical results demonstrate that MCA-SAM sets new benchmarks, outperforming existing methods in three challenging domains.
arXiv Detail & Related papers (2024-08-12T06:23:10Z) - Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results.
It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet.
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z) - Fully Attentional Networks with Self-emerging Token Labeling [108.53230681047617]
We train a FAN token labeler (FAN-TL) to generate semantically meaningful patch token labels, followed by a FAN student model training stage that uses both the token labels and the original class label.
With the proposed STL framework, our best model achieves 84.8% Top-1 accuracy and 42.1% mCE on ImageNet-1K and ImageNet-C, and sets a new state-of-the-art for ImageNet-A (46.1%) and ImageNet-R (56.6%) without using extra data.
arXiv Detail & Related papers (2024-01-08T12:14:15Z) - A Re-Parameterized Vision Transformer (ReVT) for Domain-Generalized
Semantic Segmentation [24.8695123473653]
We present a new augmentation-driven approach to domain generalization for semantic segmentation.
We achieve state-of-the-art mIoU performance of 47.3% (prior art: 46.3%) for small models and of 50.1% (prior art: 47.8%) for midsized models on commonly used benchmark datasets.
arXiv Detail & Related papers (2023-08-25T12:06:00Z) - Memory-Efficient Graph Convolutional Networks for Object Classification
and Detection with Event Cameras [2.3311605203774395]
Graph convolutional networks (GCNs) are a promising approach for analyzing event data.
In this paper, we consider both factors together in order to achieve satisfying results and relatively low model complexity.
Our results show a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation.
arXiv Detail & Related papers (2023-07-26T11:44:44Z) - Boosting Visual-Language Models by Exploiting Hard Samples [126.35125029639168]
HELIP is a cost-effective strategy tailored to enhance the performance of existing CLIP models.
Our method allows for effortless integration with existing models' training pipelines.
On comprehensive benchmarks, HELIP consistently boosts existing models to achieve leading performance.
arXiv Detail & Related papers (2023-05-09T07:00:17Z) - Pre-processing training data improves accuracy and generalisability of
convolutional neural network based landscape semantic segmentation [2.8747398859585376]
We trialled different methods of data preparation for CNN training and semantic segmentation of land use land cover (LULC) features within aerial photography over the Wet Tropics and Atherton Tablelands, Queensland, Australia.
This was conducted through trialling and ranking various training patch selection sampling strategies, patch and batch sizes and data augmentations and scaling.
We fully trained five models on the 2018 training image and applied the model to the 2015 test image with the output LULC classifications achieving an average of 0.84 user accuracy of 0.81 and producer accuracy of 0.87.
arXiv Detail & Related papers (2023-04-28T04:38:45Z) - Learning Customized Visual Models with Retrieval-Augmented Knowledge [104.05456849611895]
We propose REACT, a framework to acquire the relevant web knowledge to build customized visual models for target domains.
We retrieve the most relevant image-text pairs from the web-scale database as external knowledge, and propose to customize the model by only training new modualized blocks while freezing all the original weights.
The effectiveness of REACT is demonstrated via extensive experiments on classification, retrieval, detection and segmentation tasks, including zero, few, and full-shot settings.
arXiv Detail & Related papers (2023-01-17T18:59:06Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - Consistency and Monotonicity Regularization for Neural Knowledge Tracing [50.92661409499299]
Knowledge Tracing (KT) tracking a human's knowledge acquisition is a central component in online learning and AI in Education.
We propose three types of novel data augmentation, coined replacement, insertion, and deletion, along with corresponding regularization losses.
Extensive experiments on various KT benchmarks show that our regularization scheme consistently improves the model performances.
arXiv Detail & Related papers (2021-05-03T02:36:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.