Related papers: Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection

Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection

URL: http://arxiv.org/abs/2309.03047v1
Date: Wed, 6 Sep 2023 14:41:55 GMT
Title: Combining pre-trained Vision Transformers and CIDER for Out Of Domain Detection
Authors: Gr\'egor Jouet, Cl\'ement Duhart, Francis Rousseaux, Julio Laborde, Cyril de Runz
Abstract summary: Most industrial pipelines rely on pre-trained models for downstream tasks such as CNN or Vision Transformers. This paper investigates the performance of those models on the task of out-of-domain detection.
Score: 0.774971301405295
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Out-of-domain (OOD) detection is a crucial component in industrial applications as it helps identify when a model encounters inputs that are outside the training distribution. Most industrial pipelines rely on pre-trained models for downstream tasks such as CNN or Vision Transformers. This paper investigates the performance of those models on the task of out-of-domain detection. Our experiments demonstrate that pre-trained transformers models achieve higher detection performance out of the box. Furthermore, we show that pre-trained ViT and CNNs can be combined with refinement methods such as CIDER to improve their OOD detection performance even more. Our results suggest that transformers are a promising approach for OOD detection and set a stronger baseline for this task in many contexts

Related papers

PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection [10.724906455759854]
We introduce Prior-augmented Vision Transformer (PViT) to enhance the robustness of ViT models for image Out-of-Distribution (OOD) detection. PViT shapes the decision boundary between ID and OOD by utilizing the proposed prior guided confidence. PViT significantly outperforms existing SOTA OOD detection methods in terms of FPR95 and AUROC.
arXiv Detail & Related papers (2024-10-27T23:29:46Z)
How to train your ViT for OOD Detection [36.56346240815833]
VisionTransformers are powerful out-of-distribution detectors for ImageNet-scale settings. We investigate the impact of both the pretraining and finetuning scheme on the performance of ViTs.
arXiv Detail & Related papers (2024-05-21T08:36:30Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification. The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z)
Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection [78.2325219839805]
imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP. Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by 2.8%.
arXiv Detail & Related papers (2022-05-19T15:11:20Z)
Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer [41.44769642537572]
Unary-Pairwise Transformer is a two-stage detector that exploits unary and pairwise representations for HOIs. We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches.
arXiv Detail & Related papers (2021-12-03T10:52:06Z)
An Empirical Study of Training End-to-End Vision-and-Language Transformers [50.23532518166621]
We present METER(textbfMultimodal textbfEnd-to-end textbfTransformtextbfER), through which we investigate how to design and pre-train a fully transformer-based VL model. Specifically, we dissect the model designs along multiple dimensions: vision encoders (e.g., CLIP-ViT, Swin transformer), text encoders (e.g., RoBERTa, DeBERTa), multimodal fusion (e.g., merged attention vs. co-
arXiv Detail & Related papers (2021-11-03T17:55:36Z)
ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection. vision transformers are the first fully transformer-based architecture for image classification. In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z)
Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results. ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance. We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z)
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers [11.251593386108189]
We propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR) Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation.
arXiv Detail & Related papers (2020-11-18T05:16:11Z)
Pretrained Transformers Improve Out-of-Distribution Robustness [72.38747394482247]
We measure out-of-distribution generalization for seven NLP datasets. We show that pretrained Transformers' performance declines are substantially smaller. We examine which factors affect robustness, finding that larger models are not necessarily more robust.
arXiv Detail & Related papers (2020-04-13T17:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.