Related papers: Likelihood-guided Regularization in Attention Based Models

Likelihood-guided Regularization in Attention Based Models

URL: http://arxiv.org/abs/2511.13221v1
Date: Mon, 17 Nov 2025 10:38:09 GMT
Title: Likelihood-guided Regularization in Attention Based Models
Authors: Mohamed Salem, Inyoung Kim,
Abstract summary: We propose a likelihood-guided variational Ising-based regularization framework for Vision Transformers (ViTs)<n>We show that the Ising regularizer leads to better-calibrated probability estimates and structured feature selection through uncertainty-aware attention mechanisms.
Score: 1.561268797057701
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The transformer architecture has demonstrated strong performance in classification tasks involving structured and high-dimensional data. However, its success often hinges on large- scale training data and careful regularization to prevent overfitting. In this paper, we intro- duce a novel likelihood-guided variational Ising-based regularization framework for Vision Transformers (ViTs), which simultaneously enhances model generalization and dynamically prunes redundant parameters. The proposed variational Ising-based regularization approach leverages Bayesian sparsification techniques to impose structured sparsity on model weights, allowing for adaptive architecture search during training. Unlike traditional dropout-based methods, which enforce fixed sparsity patterns, the variational Ising-based regularization method learns task-adaptive regularization, improving both efficiency and interpretability. We evaluate our approach on benchmark vision datasets, including MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100, demonstrating improved generalization under sparse, complex data and allowing for principled uncertainty quantification on both weights and selection parameters. Additionally, we show that the Ising regularizer leads to better-calibrated probability estimates and structured feature selection through uncertainty-aware attention mechanisms. Our results highlight the effectiveness of structured Bayesian sparsification in enhancing transformer-based architectures, offering a principled alternative to standard regularization techniques.

Related papers

Enhancing Generalization of Depth Estimation Foundation Model via Weakly-Supervised Adaptation with Regularization [21.788680301776207]
We propose WeSTAR, a parameter-efficient framework that performs Weakly supervised Self-Training Adaptation with Regularization.<n>We first adopt a dense self-training objective as the primary source of structural self-supervision.<n>To further improve robustness, we introduce semantically-aware hierarchical normalization.
arXiv Detail & Related papers (2025-11-18T08:16:16Z)
An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models [59.13182819190547]
Fuzzy rule-based models excel in interpretability and have seen widespread application across diverse fields.<n>They face challenges such as complex design specifications and scalability issues with large datasets.<n>This paper proposes an Integrated Fusion Framework that merges the strengths of both paradigms to enhance model performance and interpretability.
arXiv Detail & Related papers (2025-11-11T10:28:23Z)
SHAP-Guided Regularization in Machine Learning Models [1.0515439489916734]
We propose a SHAP-guided regularization framework that incorporates feature importance constraints into model training to enhance both predictive performance and interpretability.<n>Our approach applies entropy-based penalties to encourage sparse, concentrated feature attributions while promoting stability across samples.
arXiv Detail & Related papers (2025-07-31T15:45:38Z)
Bayesian Experimental Design for Model Discrepancy Calibration: An Auto-Differentiable Ensemble Kalman Inversion Approach [0.0]
We propose a hybrid BED framework enabled by auto-differentiable ensemble Kalman inversion (AD-EKI)<n>We iteratively optimize experimental designs, decoupling the inference of low-dimensional physical parameters handled by standard BED methods.<n>The proposed method is studied by a classical convection-diffusion BED example.
arXiv Detail & Related papers (2025-04-29T00:10:45Z)
Structural Entropy Guided Probabilistic Coding [52.01765333755793]
We propose a novel structural entropy-guided probabilistic coding model, named SEPC.<n>We incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss.<n> Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC.
arXiv Detail & Related papers (2024-12-12T00:37:53Z)
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization. A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR. For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z)
A Conditioned Unsupervised Regression Framework Attuned to the Dynamic Nature of Data Streams [0.0]
This paper presents an optimal strategy for streaming contexts with limited labeled data, introducing an adaptive technique for unsupervised regression. The proposed method leverages a sparse set of initial labels and introduces an innovative drift detection mechanism. To enhance adaptability, we integrate the ADWIN (ADaptive WINdowing) algorithm with error generalization based on Root Mean Square Error (RMSE)
arXiv Detail & Related papers (2023-12-12T19:23:54Z)
Consistency Regularization for Generalizable Source-free Domain Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset. Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets. We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z)
Posterior Differential Regularization with f-divergence for Improving Model Robustness [95.05725916287376]
We focus on methods that regularize the model posterior difference between clean and noisy inputs. We generalize the posterior differential regularization to the family of $f$-divergences. Our experiments show that regularizing the posterior differential with $f$-divergence can result in well-improved model robustness.
arXiv Detail & Related papers (2020-10-23T19:58:01Z)
Target-Embedding Autoencoders for Supervised Representation Learning [111.07204912245841]
This paper analyzes a framework for improving generalization in a purely supervised setting, where the target space is high-dimensional. We motivate and formalize the general framework of target-embedding autoencoders (TEA) for supervised prediction, learning intermediate latent representations jointly optimized to be both predictable from features as well as predictive of targets.
arXiv Detail & Related papers (2020-01-23T02:37:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.