Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep
Learning
- URL: http://arxiv.org/abs/2106.04015v1
- Date: Mon, 7 Jun 2021 23:57:32 GMT
- Title: Uncertainty Baselines: Benchmarks for Uncertainty & Robustness in Deep
Learning
- Authors: Zachary Nado, Neil Band, Mark Collier, Josip Djolonga, Michael W.
Dusenberry, Sebastian Farquhar, Angelos Filos, Marton Havasi, Rodolphe
Jenatton, Ghassen Jerfel, Jeremiah Liu, Zelda Mariet, Jeremy Nixon, Shreyas
Padhy, Jie Ren, Tim G. J. Rudner, Yeming Wen, Florian Wenzel, Kevin Murphy,
D. Sculley, Balaji Lakshminarayanan, Jasper Snoek, Yarin Gal, Dustin Tran
- Abstract summary: We introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks.
Each baseline is a self-contained experiment pipeline with easily reusable and extendable components.
We provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results.
- Score: 66.59455427102152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-quality estimates of uncertainty and robustness are crucial for numerous
real-world applications, especially for deep learning which underlies many
deployed ML systems. The ability to compare techniques for improving these
estimates is therefore very important for research and practice alike. Yet,
competitive comparisons of methods are often lacking due to a range of reasons,
including: compute availability for extensive tuning, incorporation of
sufficiently many baselines, and concrete documentation for reproducibility. In
this paper we introduce Uncertainty Baselines: high-quality implementations of
standard and state-of-the-art deep learning methods on a variety of tasks. As
of this writing, the collection spans 19 methods across 9 tasks, each with at
least 5 metrics. Each baseline is a self-contained experiment pipeline with
easily reusable and extendable components. Our goal is to provide immediate
starting points for experimentation with new methods or applications.
Additionally we provide model checkpoints, experiment outputs as Python
notebooks, and leaderboards for comparing results. Code available at
https://github.com/google/uncertainty-baselines.
Related papers
- Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - 1-Lipschitz Layers Compared: Memory, Speed, and Certifiable Robustness [22.09354138194545]
robustness of neural networks against input perturbations with bounded magnitude represents a serious concern in the deployment of deep learning models in safety-critical systems.
Recently, the scientific community has focused on enhancing certifiable robustness guarantees by crafting 1-Lipschitz neural networks that leverage Lipschitz bounded dense and convolutional layers.
This paper provides a theoretical and empirical comparison between methods by evaluating them in terms of memory usage, speed, and certifiable robust accuracy.
arXiv Detail & Related papers (2023-11-28T14:50:50Z) - A Comprehensive Empirical Evaluation on Online Continual Learning [20.39495058720296]
We evaluate methods from the literature that tackle online continual learning.
We focus on the class-incremental setting in the context of image classification.
We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks.
arXiv Detail & Related papers (2023-08-20T17:52:02Z) - DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection [55.70982767084996]
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark.
We present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions.
DeepfakeBench contains 15 state-of-the-art detection methods, 9CL datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations.
arXiv Detail & Related papers (2023-07-04T01:34:41Z) - Unsupervised Embedding Quality Evaluation [6.72542623686684]
SSL models are often unclear whether they will perform well when transferred to another domain.
Can we quantify how easy it is to linearly separate the data in a stable way?
We introduce one novel method based on recent advances in understanding the high-dimensional geometric structure of self-supervised learning.
arXiv Detail & Related papers (2023-05-26T01:06:44Z) - Bag of Tricks for Training Data Extraction from Language Models [98.40637430115204]
We investigate and benchmark tricks for improving training data extraction using a publicly available dataset.
The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction.
arXiv Detail & Related papers (2023-02-09T06:46:42Z) - Differential testing for machine learning: an analysis for
classification algorithms beyond deep learning [7.081604594416339]
We conduct a case study using Scikit-learn, Weka, Spark MLlib, and Caret.
We identify the potential of differential testing by considering which algorithms are available in multiple frameworks.
The feasibility seems limited because often it is not possible to determine configurations that are the same in other frameworks.
arXiv Detail & Related papers (2022-07-25T08:27:01Z) - A novel evaluation methodology for supervised Feature Ranking algorithms [0.0]
This paper proposes a new evaluation methodology for Feature Rankers.
By making use of synthetic datasets, feature importance scores can be known beforehand, allowing more systematic evaluation.
To facilitate large-scale experimentation using the new methodology, a benchmarking framework was built in Python, called fseval.
arXiv Detail & Related papers (2022-07-09T12:00:36Z) - Toward the Understanding of Deep Text Matching Models for Information
Retrieval [72.72380690535766]
This paper aims at testing whether existing deep text matching methods satisfy some fundamental gradients in information retrieval.
Specifically, four attributions are used in our study, i.e., term frequency constraint, term discrimination constraint, length normalization constraints, and TF-length constraint.
Experimental results on LETOR 4.0 and MS Marco show that all the investigated deep text matching methods satisfy the above constraints with high probabilities in statistics.
arXiv Detail & Related papers (2021-08-16T13:33:15Z) - Fast Uncertainty Quantification for Deep Object Pose Estimation [91.09217713805337]
Deep learning-based object pose estimators are often unreliable and overconfident.
In this work, we propose a simple, efficient, and plug-and-play UQ method for 6-DoF object pose estimation.
arXiv Detail & Related papers (2020-11-16T06:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.