Related papers: Trapped in texture bias? A large scale comparison of deep instance segmentation

Trapped in texture bias? A large scale comparison of deep instance segmentation

URL: http://arxiv.org/abs/2401.09109v1
Date: Wed, 17 Jan 2024 10:21:08 GMT
Title: Trapped in texture bias? A large scale comparison of deep instance segmentation
Authors: Johannes Theodoridis, Jessica Hofmann, Johannes Maucher, Andreas Schilling
Abstract summary: We evaluate 68 models on 61 versions of MS COCO for a total of 4148 evaluations. We find that YOLACT++, SOTR and SOLOv2 are significantly more robust to out-of-distribution texture than other frameworks.
Score: 4.2603120588176635
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Do deep learning models for instance segmentation generalize to novel objects in a systematic way? For classification, such behavior has been questioned. In this study, we aim to understand if certain design decisions such as framework, architecture or pre-training contribute to the semantic understanding of instance segmentation. To answer this question, we consider a special case of robustness and compare pre-trained models on a challenging benchmark for object-centric, out-of-distribution texture. We do not introduce another method in this work. Instead, we take a step back and evaluate a broad range of existing literature. This includes Cascade and Mask R-CNN, Swin Transformer, BMask, YOLACT(++), DETR, BCNet, SOTR and SOLOv2. We find that YOLACT++, SOTR and SOLOv2 are significantly more robust to out-of-distribution texture than other frameworks. In addition, we show that deeper and dynamic architectures improve robustness whereas training schedules, data augmentation and pre-training have only a minor impact. In summary we evaluate 68 models on 61 versions of MS COCO for a total of 4148 evaluations.

Related papers

End-to-End Ontology Learning with Large Language Models [11.755755139228219]
Large language models (LLMs) have been applied to solve various subtasks of ontology learning. We address this gap by OLLM, a general and scalable method for building the taxonomic backbone of an ontology from scratch. In contrast to standard metrics, our metrics use deep learning techniques to define more robust structural distance measures between graphs. Our model can be effectively adapted to new domains, like arXiv, needing only a small number of training examples.
arXiv Detail & Related papers (2024-10-31T02:52:39Z)
Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks. Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts. Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation [63.910211095033596]
Recently few-shot segmentation (FSS) has been extensively developed. This paper proposes a fresh and straightforward insight to alleviate the problem. In light of the unique nature of the proposed approach, we also extend it to a more realistic but challenging setting.
arXiv Detail & Related papers (2022-03-15T03:08:27Z)
Conterfactual Generative Zero-Shot Semantic Segmentation [16.684570608930983]
One of the popular zero-shot semantic segmentation methods is based on the generative model. In this work, we consider counterfactual methods to avoid the confounder in the original model. Our model is compared with baseline models on two real-world datasets.
arXiv Detail & Related papers (2021-06-11T13:01:03Z)
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. In this paper, we first study how biases in the dataset affect existing methods. We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z)
RethinkCWS: Is Chinese Word Segmentation a Solved Task? [81.11161697133095]
The performance of the Chinese Word (CWS) systems has gradually reached a plateau with the rapid development of deep neural networks. In this paper, we take stock of what we have achieved and rethink what's left in the CWS task.
arXiv Detail & Related papers (2020-11-13T11:07:08Z)
An Analysis of Dataset Overlap on Winograd-Style Tasks [40.27778524078]
We analyze the effects of varying degrees of overlap between training corpora and test instances in WSC-style tasks. KnowRef-60K is the largest corpus to date for WSC-style common-sense reasoning.
arXiv Detail & Related papers (2020-11-09T21:11:17Z)
Objectness-Aware Few-Shot Semantic Segmentation [31.13009111054977]
We show how to increase overall model capacity to achieve improved performance. We introduce objectness, which is class-agnostic and so not prone to overfitting. Given only one annotated example of an unseen category, experiments show that our method outperforms state-of-art methods with respect to mIoU.
arXiv Detail & Related papers (2020-04-06T19:12:08Z)
Learning What to Learn for Video Object Segmentation [157.4154825304324]
We introduce an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module. This internal learner is designed to predict a powerful parametric model of the target. We set a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5.
arXiv Detail & Related papers (2020-03-25T17:58:43Z)
Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time. Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting. We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.