Related papers: Hardware Resilience Properties of Text-Guided Image Classifiers

Hardware Resilience Properties of Text-Guided Image Classifiers

URL: http://arxiv.org/abs/2311.14062v2
Date: Tue, 5 Dec 2023 08:56:04 GMT
Title: Hardware Resilience Properties of Text-Guided Image Classifiers
Authors: Syed Talal Wasim, Kabila Haile Soboka, Abdulrahman Mahmoud, Salman Khan, David Brooks, Gu-Yeon Wei
Abstract summary: We present a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. Our approach achieves a remarkable $5.5times$ average increase in hardware reliability.
Score: 15.787551066303804
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper presents a novel method to enhance the reliability of image classification models during deployment in the face of transient hardware errors. By utilizing enriched text embeddings derived from GPT-3 with question prompts per class and CLIP pretrained text encoder, we investigate their impact as an initialization for the classification layer. Our approach achieves a remarkable $5.5\times$ average increase in hardware reliability (and up to $14\times$) across various architectures in the most critical layer, with minimal accuracy drop ($0.3\%$ on average) compared to baseline PyTorch models. Furthermore, our method seamlessly integrates with any image classification backbone, showcases results across various network architectures, decreases parameter and FLOPs overhead, and follows a consistent training recipe. This research offers a practical and efficient solution to bolster the robustness of image classification models against hardware failures, with potential implications for future studies in this domain. Our code and models are released at https://github.com/TalalWasim/TextGuidedResilience.

Related papers

Patch and Shuffle: A Preprocessing Technique for Texture Classification in Autonomous Cementitious Fabrication [0.0]
"patch and shuffle" technique segments input images into smaller patches, shuffles them, and reconstructs a jumbled image before classification. We evaluate this approach on a dataset of extruded cement images, using a ResNet-18-based architecture. Results show a significant improvement in accuracy: the patch and shuffle model achieved 90.64% test accuracy versus 72.46% for the baseline.
arXiv Detail & Related papers (2025-04-14T16:03:21Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z)
Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection [13.840950434728533]
State-of-the-art Synthetic Image Detection (SID) research has led to strong evidence on the advantages of feature extraction from foundation models. We leverage the image representations extracted by intermediate Transformer blocks of CLIP's image-encoder via a lightweight network. Our method is compared against the state-of-the-art by evaluating it on 20 test datasets and exhibits an average +10.6% absolute performance improvement.
arXiv Detail & Related papers (2024-02-29T12:18:43Z)
An Ordinal Regression Framework for a Deep Learning Based Severity Assessment for Chest Radiographs [50.285682227571996]
We propose a framework that divides the ordinal regression problem into three parts: a model, a target function, and a classification function. We show that the choice of encoding has a strong impact on performance and that the best encoding depends on the chosen weighting of Cohen's kappa.
arXiv Detail & Related papers (2024-02-08T14:00:45Z)
Benchmarking Robustness to Text-Guided Corruptions [0.0]
We use diffusion models to edit images to different domains. We define a prompt hierarchy based on the original ImageNet hierarchy to apply edits in different domains. We observe that convolutional models are more robust than transformer architectures.
arXiv Detail & Related papers (2023-04-06T09:40:02Z)
Fine-Grained ImageNet Classification in the Wild [0.0]
Robustness tests can uncover several vulnerabilities and biases which go unnoticed during the typical model evaluation stage. In our work, we perform fine-grained classification on closely related categories, which are identified with the help of hierarchical knowledge.
arXiv Detail & Related papers (2023-03-04T12:25:07Z)
Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks. After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z)
Simple Open-Vocabulary Object Detection with Vision Transformers [51.57562920090721]
We propose a strong recipe for transferring image-text models to open-vocabulary object detection. We use a standard Vision Transformer architecture with minimal modifications, contrastive image-text pre-training, and end-to-end detection fine-tuning. We provide the adaptation strategies and regularizations needed to attain very strong performance on zero-shot text-conditioned and one-shot image-conditioned object detection.
arXiv Detail & Related papers (2022-05-12T17:20:36Z)
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process. We propose a novel UDA method, DAFormer, based on the benchmark results. DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z)
A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark [33.86872697028233]
We present an in-depth study on few-shot video classification by making three contributions. First, we perform a consistent comparative study on the existing metric-based methods to figure out their limitations in representation learning. Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting. Third, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training.
arXiv Detail & Related papers (2021-10-24T06:01:46Z)
Reconciliation of Statistical and Spatial Sparsity For Robust Image and Image-Set Classification [27.319334479994787]
We propose a novel Joint Statistical and Spatial Sparse representation, dubbed textitJ3S, to model the image or image-set data for classification. We propose to solve the joint sparse coding problem based on the J3S model, by coupling the local and global image representations using joint sparsity. Experiments show that the proposed J3S-based image classification scheme outperforms the popular or state-of-the-art competing methods over FMD, UIUC, ETH-80 and YTC databases.
arXiv Detail & Related papers (2021-06-01T06:33:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.