Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models
- URL: http://arxiv.org/abs/2308.10632v3
- Date: Thu, 16 May 2024 12:02:45 GMT
- Title: Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models
- Authors: Peiyan Zhang, Haoyang Liu, Chaozhuo Li, Xing Xie, Sunghun Kim, Haohan Wang,
- Abstract summary: We introduce a new robustness measurement that directly measures the image classification model's performance compared with a surrogate oracle.
Our new method will offer us a new way to evaluate the models' robustness performance, free of limitations of fixed benchmarks or constrained perturbations.
- Score: 38.16654407693728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning has demonstrated remarkable performance over finite datasets, yet whether the scores over the fixed benchmarks can sufficiently indicate the model's performance in the real world is still in discussion. In reality, an ideal robust model will probably behave similarly to the oracle (e.g., the human users), thus a good evaluation protocol is probably to evaluate the models' behaviors in comparison to the oracle. In this paper, we introduce a new robustness measurement that directly measures the image classification model's performance compared with a surrogate oracle (i.e., a foundation model). Besides, we design a simple method that can accomplish the evaluation beyond the scope of the benchmarks. Our method extends the image datasets with new samples that are sufficiently perturbed to be distinct from the ones in the original sets, but are still bounded within the same image-label structure the original test image represents, constrained by a foundation model pretrained with a large amount of samples. As a result, our new method will offer us a new way to evaluate the models' robustness performance, free of limitations of fixed benchmarks or constrained perturbations, although scoped by the power of the oracle. In addition to the evaluation results, we also leverage our generated data to understand the behaviors of the model and our new evaluation strategies.
Related papers
- Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Intriguing Differences Between Zero-Shot and Systematic Evaluations of
Vision-Language Transformer Models [7.360937524701675]
Transformer-based models have dominated natural language processing and other areas in the last few years due to their superior (zero-shot) performance on benchmark datasets.
In this paper, based on a new gradient descent optimization method, we are able to explore the embedding space of a commonly used vision-language model.
Using the Imagenette dataset, we show that while the model achieves over 99% zero-shot classification performance, it fails systematic evaluations completely.
arXiv Detail & Related papers (2024-02-13T14:07:49Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Anchor Points: Benchmarking Models with Much Fewer Examples [88.02417913161356]
In six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models.
We propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset.
Just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error.
arXiv Detail & Related papers (2023-09-14T17:45:51Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Evaluating Representations with Readout Model Switching [18.475866691786695]
In this paper, we propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric.
We design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions.
The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures.
arXiv Detail & Related papers (2023-02-19T14:08:01Z) - Effective Robustness against Natural Distribution Shifts for Models with
Different Training Data [113.21868839569]
"Effective robustness" measures the extra out-of-distribution robustness beyond what can be predicted from the in-distribution (ID) performance.
We propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data.
arXiv Detail & Related papers (2023-02-02T19:28:41Z) - RockNER: A Simple Method to Create Adversarial Examples for Evaluating
the Robustness of Named Entity Recognition Models [32.806292167848156]
We propose RockNER to audit the robustness of named entity recognition models.
We replace target entities with other entities of the same semantic class in Wikidata.
At the context level, we use pre-trained language models to generate word substitutions.
arXiv Detail & Related papers (2021-09-12T21:30:21Z) - Estimating the Robustness of Classification Models by the Structure of
the Learned Feature-Space [10.418647759223964]
We argue that fixed testsets are only able to capture a small portion of possible data variations and are thus limited and prone to generate new overfitted solutions.
To overcome these drawbacks, we suggest to estimate the robustness of a model directly from the structure of its learned feature-space.
arXiv Detail & Related papers (2021-06-23T10:52:29Z) - Conterfactual Generative Zero-Shot Semantic Segmentation [16.684570608930983]
One of the popular zero-shot semantic segmentation methods is based on the generative model.
In this work, we consider counterfactual methods to avoid the confounder in the original model.
Our model is compared with baseline models on two real-world datasets.
arXiv Detail & Related papers (2021-06-11T13:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.