Adventures of Trustworthy Vision-Language Models: A Survey
- URL: http://arxiv.org/abs/2312.04231v1
- Date: Thu, 7 Dec 2023 11:31:20 GMT
- Title: Adventures of Trustworthy Vision-Language Models: A Survey
- Authors: Mayank Vatsa, Anubhooti Jain, Richa Singh
- Abstract summary: This paper conducts a thorough examination of vision-language transformers, employing three fundamental principles of responsible AI: Bias, Robustness, and Interpretability.
The primary objective of this paper is to delve into the intricacies and complexities associated with the practical use of transformers, with the overarching goal of advancing our comprehension of how to enhance their reliability and accountability.
- Score: 54.76511683427566
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, transformers have become incredibly popular in computer vision and
vision-language tasks. This notable rise in their usage can be primarily
attributed to the capabilities offered by attention mechanisms and the
outstanding ability of transformers to adapt and apply themselves to a variety
of tasks and domains. Their versatility and state-of-the-art performance have
established them as indispensable tools for a wide array of applications.
However, in the constantly changing landscape of machine learning, the
assurance of the trustworthiness of transformers holds utmost importance. This
paper conducts a thorough examination of vision-language transformers,
employing three fundamental principles of responsible AI: Bias, Robustness, and
Interpretability. The primary objective of this paper is to delve into the
intricacies and complexities associated with the practical use of transformers,
with the overarching goal of advancing our comprehension of how to enhance
their reliability and accountability.
Related papers
- Transformer in Touch: A Survey [29.622771021984594]
The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception.
This review aims to comprehensively outline the application and development of Transformers in tactile technology.
arXiv Detail & Related papers (2024-05-21T13:26:27Z) - Explainability of Vision Transformers: A Comprehensive Review and New
Perspectives [11.853186902106067]
Transformers have had a significant impact on natural language processing and have recently demonstrated their potential in computer vision.
This study explores different explainability methods proposed for visual transformers and presents a taxonomy for organizing them.
It provides a comprehensive review of evaluation criteria that can be used for comparing explanation results.
arXiv Detail & Related papers (2023-11-12T09:23:40Z) - A Comprehensive Survey on Applications of Transformers for Deep Learning
Tasks [60.38369406877899]
Transformer is a deep neural network that employs a self-attention mechanism to comprehend the contextual relationships within sequential data.
transformer models excel in handling long dependencies between input sequence elements and enable parallel processing.
Our survey encompasses the identification of the top five application domains for transformer-based models.
arXiv Detail & Related papers (2023-06-11T23:13:51Z) - Learning Explicit Object-Centric Representations with Vision
Transformers [81.38804205212425]
We build on the self-supervision task of masked autoencoding and explore its effectiveness for learning object-centric representations with transformers.
We show that the model efficiently learns to decompose simple scenes as measured by segmentation metrics on several multi-object benchmarks.
arXiv Detail & Related papers (2022-10-25T16:39:49Z) - Vision Transformers: State of the Art and Research Challenges [26.462994554165697]
This paper presents a comprehensive overview of the literature on different architecture designs and training tricks for vision transformers.
Our goal is to provide a systematic review with the open research opportunities.
arXiv Detail & Related papers (2022-07-07T02:01:56Z) - AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use.
Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z) - Tensor-to-Image: Image-to-Image Translation with Vision Transformers [0.0]
In this paper, we utilize a vision transformer-based custom-designed model, tensor-to-image, for the image to image translation.
With the help of self-attention, our model was able to generalize and apply to different problems without a single modification.
arXiv Detail & Related papers (2021-10-06T17:57:45Z) - Transformers in Vision: A Survey [101.07348618962111]
Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence.
Transformers require minimal inductive biases for their design and are naturally suited as set-functions.
This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline.
arXiv Detail & Related papers (2021-01-04T18:57:24Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.