Mitigating Transformer Overconfidence via Lipschitz Regularization
- URL: http://arxiv.org/abs/2306.06849v2
- Date: Tue, 18 Jul 2023 16:20:43 GMT
- Title: Mitigating Transformer Overconfidence via Lipschitz Regularization
- Authors: Wenqian Ye, Yunsheng Ma, Xu Cao, Kun Tang
- Abstract summary: Lipschitz Regularized Transformer (LRFormer)
We present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound.
Experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.
- Score: 5.551514328951632
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Though Transformers have achieved promising results in many computer vision
tasks, they tend to be over-confident in predictions, as the standard Dot
Product Self-Attention (DPSA) can barely preserve distance for the unbounded
input domain. In this work, we fill this gap by proposing a novel Lipschitz
Regularized Transformer (LRFormer). Specifically, we present a new similarity
function with the distance within Banach Space to ensure the Lipschitzness and
also regularize the term by a contractive Lipschitz Bound. The proposed method
is analyzed with a theoretical guarantee, providing a rigorous basis for its
effectiveness and reliability. Extensive experiments conducted on standard
vision benchmarks demonstrate that our method outperforms the state-of-the-art
single forward pass approaches in prediction, calibration, and uncertainty
estimation.
Related papers
- Bridging the Theoretical Gap in Randomized Smoothing [43.95721606759696]
This paper introduces a new framework that bridges the gap between theoretical certified robustness and empirical accuracy.
Our approach tightens the bounds of certified robustness, offering a more accurate reflection of model robustness in practice.
arXiv Detail & Related papers (2025-04-03T09:05:49Z) - Rectifying Conformity Scores for Better Conditional Coverage [75.73184036344908]
We present a new method for generating confidence sets within the split conformal prediction framework.
Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact marginal coverage.
arXiv Detail & Related papers (2025-02-22T19:54:14Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Online scalable Gaussian processes with conformal prediction for guaranteed coverage [32.21093722162573]
The consistency of the resulting uncertainty values hinges on the premise that the learning function conforms to the properties specified by the GP model.
We propose to wed the GP with the prevailing conformal prediction (CP), a distribution-free post-processing framework that produces it prediction sets with a provably valid coverage.
arXiv Detail & Related papers (2024-10-07T19:22:15Z) - Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - Transitional Uncertainty with Layered Intermediate Predictions [14.11559987180237]
We discuss feature engineering for single-pass uncertainty estimation.
We propose Transitional Uncertainty with Layered Intermediate Predictions (T) as a simple approach to address the shortcomings of current single-pass estimators.
arXiv Detail & Related papers (2024-05-25T13:03:38Z) - Finite Sample Confidence Regions for Linear Regression Parameters Using
Arbitrary Predictors [1.6860963320038902]
We explore a novel methodology for constructing confidence regions for parameters of linear models, using predictions from any arbitrary predictor.
The derived confidence regions can be cast as constraints within a Mixed Linear Programming framework, enabling optimisation of linear objectives.
Unlike previous methods, the confidence region can be empty, which can be used for hypothesis testing.
arXiv Detail & Related papers (2024-01-27T00:15:48Z) - Device-independent certification of desirable properties with a confidence interval [0.0]
We provide a versatile solution for rigorous device-independent certification.
We show how the PBR protocol and the martingale-based protocol often offer similar performance.
Our findings also show that the performance of the martingale-based protocol may be severely affected by one's choice of the witness.
arXiv Detail & Related papers (2024-01-12T15:21:21Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - EVIL: Evidential Inference Learning for Trustworthy Semi-supervised
Medical Image Segmentation [7.58442624111591]
We introduce Evidential Inference Learning (EVIL) into semi-supervised medical image segmentation.
EVIL provides a theoretically guaranteed solution to infer accurate quantification uncertainty in a single forward pass.
We show that EVIL achieves competitive performance in comparison with several state-of-the-art methods on the public dataset.
arXiv Detail & Related papers (2023-07-18T05:59:27Z) - Federated Conformal Predictors for Distributed Uncertainty
Quantification [83.50609351513886]
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning.
In this paper, we extend conformal prediction to the federated learning setting.
We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction framework.
arXiv Detail & Related papers (2023-05-27T19:57:27Z) - Conformal Off-Policy Prediction in Contextual Bandits [54.67508891852636]
Conformal off-policy prediction can output reliable predictive intervals for the outcome under a new target policy.
We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup.
arXiv Detail & Related papers (2022-06-09T10:39:33Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.