Related papers: Exploiting CNNs for Semantic Segmentation with Pascal VOC

Exploiting CNNs for Semantic Segmentation with Pascal VOC

URL: http://arxiv.org/abs/2304.13216v2
Date: Fri, 5 May 2023 05:27:24 GMT
Title: Exploiting CNNs for Semantic Segmentation with Pascal VOC
Authors: Sourabh Prakash, Priyanshi Shah, Ashrya Agrawal
Abstract summary: We present a comprehensive study on semantic segmentation with the Pascal VOC dataset. We firstly use a Fully Convolution Network (FCN) baseline which gave 71.31% pixel accuracy and 0.0527 mean IoU. We analyze its performance and working and subsequently address the issues in the baseline with three improvements.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we present a comprehensive study on semantic segmentation with the Pascal VOC dataset. Here, we have to label each pixel with a class which in turn segments the entire image based on the objects/entities present. To tackle this, we firstly use a Fully Convolution Network (FCN) baseline which gave 71.31% pixel accuracy and 0.0527 mean IoU. We analyze its performance and working and subsequently address the issues in the baseline with three improvements: a) cosine annealing learning rate scheduler(pixel accuracy: 72.86%, IoU: 0.0529), b) data augmentation(pixel accuracy: 69.88%, IoU: 0.0585) c) class imbalance weights(pixel accuracy: 68.98%, IoU: 0.0596). Apart from these changes in training pipeline, we also explore three different architectures: a) Our proposed model -- Advanced FCN (pixel accuracy: 67.20%, IoU: 0.0602) b) Transfer Learning with ResNet (Best performance) (pixel accuracy: 71.33%, IoU: 0.0926 ) c) U-Net(pixel accuracy: 72.15%, IoU: 0.0649). We observe that the improvements help in greatly improving the performance, as reflected both, in metrics and segmentation maps. Interestingly, we observe that among the improvements, dataset augmentation has the greatest contribution. Also, note that transfer learning model performs the best on the pascal dataset. We analyse the performance of these using loss, accuracy and IoU plots along with segmentation maps, which help us draw valuable insights about the working of the models.

Related papers

Semantic Segmentation of Transparent and Opaque Drinking Glasses with the Help of Zero-shot Learning [4.23895653489492]
We propose TransCaGNet, a modified version of the zero-shot model CaGNet. We use zeroshot learning to be able to create semantic segmentations of glass categories not given during training. We show that TransCaGNet produces better mean IoU and accuracy values while ZegClip outperforms it mostly for unseen classes.
arXiv Detail & Related papers (2025-03-19T08:54:14Z)
Keypoint Aware Masked Image Modelling [0.34530027457862006]
KAMIM improves the top-1 linear probing accuracy from 16.12% to 33.97%, and finetuning accuracy from 76.78% to 77.3% when tested on the ImageNet-1K dataset with a ViT-B when trained for the same number of epochs. We also analyze the learned representations of a ViT-B trained using KAMIM and observe that they behave similar to contrastive learning with regard to its behavior, with longer attention distances and homogenous self-attention across layers.
arXiv Detail & Related papers (2024-07-18T19:41:46Z)
Self-Supervised Versus Supervised Training for Segmentation of Organoid Images [2.6242820867975127]
Large amounts of microscopic image data sets remain unlabeled, preventing their effective exploitation using deep-learning algorithms. Self-supervised learning (SSL) is a promising solution based on learning intrinsic features under a pretext task that is similar to the main task without requiring labels. A ResNet50 U-Net was first trained to restore images of liver progenitor organoids from augmented images using the Structural Similarity Index Metric (SSIM), alone, and using SSIM combined with L1 loss. For comparison, we used the same U-Net architecture to train two supervised models, one utilizing the ResNet50 encoder
arXiv Detail & Related papers (2023-11-19T01:57:55Z)
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation [73.89509052503222]
This paper presents a simple but performant semi-supervised semantic segmentation approach, called CorrMatch. We observe that the correlation maps not only enable clustering pixels of the same category easily but also contain good shape information. We propose to conduct pixel propagation by modeling the pairwise similarities of pixels to spread the high-confidence pixels and dig out more. Then, we perform region propagation to enhance the pseudo labels with accurate class-agnostic masks extracted from the correlation maps.
arXiv Detail & Related papers (2023-06-07T10:02:29Z)
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement [68.44100784364987]
We propose a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks.
arXiv Detail & Related papers (2023-03-15T23:10:17Z)
Focal Modulation Networks [105.93086472906765]
Self-attention (SA) is completely replaced by focal modulation network (FocalNet) FocalNets with tiny and base sizes achieve 82.3% and 83.9% top-1 accuracy on ImageNet-1K. FocalNets exhibit remarkable superiority when transferred to downstream tasks.
arXiv Detail & Related papers (2022-03-22T17:54:50Z)
Point Label Aware Superpixels for Multi-species Segmentation of Underwater Imagery [4.195806160139487]
Monitoring coral reefs using underwater vehicles increases the range of marine surveys and availability of historical ecological data. We propose a point label aware method for propagating labels within superpixel regions to obtain augmented ground truth for training a semantic segmentation model. Our method outperforms prior methods on the UCSD Mosaics dataset by 3.62% for pixel accuracy and 8.35% for mean IoU for the label propagation task.
arXiv Detail & Related papers (2022-02-27T23:46:43Z)
VOLO: Vision Outlooker for Visual Recognition [148.12522298731807]
Vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification. We introduce a novel outlook attention and present a simple and general architecture, termed Vision Outlooker (VOLO) Unlike self-attention that focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens. Experiments show that our VOLO achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark.
arXiv Detail & Related papers (2021-06-24T15:46:54Z)
Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling [17.739797071488212]
We conduct a large-scale transfer learning study which tests different ImageNet backbones. By replacing the VGG19 backbone of DeepGaze II with ResNet50 features we improve the performance on saliency prediction from 78% to 85%. We show that by combining multiple backbones in a principled manner a good confidence calibration on unseen datasets can be achieved.
arXiv Detail & Related papers (2021-05-26T09:59:56Z)
With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations [87.72779294717267]
Using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification. We demonstrate empirically that our method is less reliant on complex data augmentations.
arXiv Detail & Related papers (2021-04-29T17:56:08Z)
Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z)
Analyzing the Dependency of ConvNets on Spatial Information [81.93266969255711]
We propose spatial shuffling and GAP+FC to destroy spatial information during both training and testing phases. We observe that spatial information can be deleted from later layers with small performance drops.
arXiv Detail & Related papers (2020-02-05T15:22:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.