Combining pre-trained Vision Transformers and CIDER for Out Of Domain
Detection
- URL: http://arxiv.org/abs/2309.03047v1
- Date: Wed, 6 Sep 2023 14:41:55 GMT
- Title: Combining pre-trained Vision Transformers and CIDER for Out Of Domain
Detection
- Authors: Gr\'egor Jouet, Cl\'ement Duhart, Francis Rousseaux, Julio Laborde,
Cyril de Runz
- Abstract summary: Most industrial pipelines rely on pre-trained models for downstream tasks such as CNN or Vision Transformers.
This paper investigates the performance of those models on the task of out-of-domain detection.
- Score: 0.774971301405295
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Out-of-domain (OOD) detection is a crucial component in industrial
applications as it helps identify when a model encounters inputs that are
outside the training distribution. Most industrial pipelines rely on
pre-trained models for downstream tasks such as CNN or Vision Transformers.
This paper investigates the performance of those models on the task of
out-of-domain detection. Our experiments demonstrate that pre-trained
transformers models achieve higher detection performance out of the box.
Furthermore, we show that pre-trained ViT and CNNs can be combined with
refinement methods such as CIDER to improve their OOD detection performance
even more. Our results suggest that transformers are a promising approach for
OOD detection and set a stronger baseline for this task in many contexts
Related papers
- How to train your ViT for OOD Detection [36.56346240815833]
VisionTransformers are powerful out-of-distribution detectors for ImageNet-scale settings.
We investigate the impact of both the pretraining and finetuning scheme on the performance of ViTs.
arXiv Detail & Related papers (2024-05-21T08:36:30Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Transformer-based approaches to Sentiment Detection [55.41644538483948]
We examined the performance of four different types of state-of-the-art transformer models for text classification.
The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions.
arXiv Detail & Related papers (2023-03-13T17:12:03Z) - Integral Migrating Pre-trained Transformer Encoder-decoders for Visual
Object Detection [78.2325219839805]
imTED improves the state-of-the-art of few-shot object detection by up to 7.6% AP.
Experiments on MS COCO dataset demonstrate that imTED consistently outperforms its counterparts by 2.8%.
arXiv Detail & Related papers (2022-05-19T15:11:20Z) - Efficient Two-Stage Detection of Human-Object Interactions with a Novel
Unary-Pairwise Transformer [41.44769642537572]
Unary-Pairwise Transformer is a two-stage detector that exploits unary and pairwise representations for HOIs.
We evaluate our method on the HICO-DET and V-COCO datasets, and significantly outperform state-of-the-art approaches.
arXiv Detail & Related papers (2021-12-03T10:52:06Z) - An Empirical Study of Training End-to-End Vision-and-Language
Transformers [50.23532518166621]
We present METER(textbfMultimodal textbfEnd-to-end textbfTransformtextbfER), through which we investigate how to design and pre-train a fully transformer-based VL model.
Specifically, we dissect the model designs along multiple dimensions: vision encoders (e.g., CLIP-ViT, Swin transformer), text encoders (e.g., RoBERTa, DeBERTa), multimodal fusion (e.g., merged attention vs. co-
arXiv Detail & Related papers (2021-11-03T17:55:36Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z) - Toward Transformer-Based Object Detection [12.704056181392415]
Vision Transformers can be used as a backbone by a common detection task head to produce competitive COCO results.
ViT-FRCNN demonstrates several known properties associated with transformers, including large pretraining capacity and fast fine-tuning performance.
We view ViT-FRCNN as an important stepping stone toward a pure-transformer solution of complex vision tasks such as object detection.
arXiv Detail & Related papers (2020-12-17T22:33:14Z) - UP-DETR: Unsupervised Pre-training for Object Detection with
Transformers [11.251593386108189]
We propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR)
Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder.
UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation.
arXiv Detail & Related papers (2020-11-18T05:16:11Z) - Pretrained Transformers Improve Out-of-Distribution Robustness [72.38747394482247]
We measure out-of-distribution generalization for seven NLP datasets.
We show that pretrained Transformers' performance declines are substantially smaller.
We examine which factors affect robustness, finding that larger models are not necessarily more robust.
arXiv Detail & Related papers (2020-04-13T17:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.