Related papers: ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization

ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization

URL: http://arxiv.org/abs/2412.01476v2
Date: Fri, 24 Jan 2025 14:16:46 GMT
Title: ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization
Authors: RuiZhe Jiang, Haotian Lei,
Abstract summary: Over- parameterized neural network models often lead to significant performance discrepancies between training and test sets.<n>We introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets.<n>We propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set.
Score: 0.32885740436059047
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Over-parameterized neural network models often lead to significant performance discrepancies between training and test sets, a phenomenon known as overfitting. To address this, researchers have proposed numerous regularization techniques tailored to various tasks and model architectures. In this paper, we introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. Based on this viewpoint, we propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set. Due to minimal prior assumptions, this approach is applicable to almost any architecture and task. Our experiments show that it effectively reduces overfitting, with low sensitivity to hyperparameters and minimal computational cost. It demonstrates particularly strong memory suppression and promotes normal convergence, even when the model has already started to overfit. Even in the absence of significant overfitting, our method consistently improves accuracy and reduces validation loss.

Related papers

Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection [66.16595174895802]
Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance. In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection.
arXiv Detail & Related papers (2024-11-23T19:10:32Z)
Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization [23.78498670529746]
We introduce a regularization technique to ensure that the magnitudes of the extracted features are evenly distributed. Despite its apparent simplicity, our approach has demonstrated significant performance improvements across various fine-grained visual recognition datasets.
arXiv Detail & Related papers (2024-09-03T07:32:46Z)
Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification [3.0398616939692777]
Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard. The study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks.
arXiv Detail & Related papers (2024-05-29T15:44:51Z)
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks. By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections. Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z)
Solving Inverse Problems with Model Mismatch using Untrained Neural Networks within Model-based Architectures [14.551812310439004]
We introduce an untrained forward model residual block within the model-based architecture to match the data consistency in the measurement domain for each instance. Our approach offers a unified solution that is less parameter-sensitive, requires no additional data, and enables simultaneous fitting of the forward model and reconstruction in a single pass.
arXiv Detail & Related papers (2024-03-07T19:02:13Z)
Implicit Variational Inference for High-Dimensional Posteriors [7.924706533725115]
In variational inference, the benefits of Bayesian models rely on accurately capturing the true posterior distribution. We propose using neural samplers that specify implicit distributions, which are well-suited for approximating complex multimodal and correlated posteriors. Our approach introduces novel bounds for approximate inference using implicit distributions by locally linearising the neural sampler.
arXiv Detail & Related papers (2023-10-10T14:06:56Z)
From Bricks to Bridges: Product of Invariances to Enhance Latent Space Communication [19.336940758147442]
It has been observed that representations learned by distinct neural networks conceal structural similarities when the models are trained under similar inductive biases. We introduce a versatile method to directly incorporate a set of invariances into the representations, constructing a product space of invariant components on top of the latent representations. We validate our solution on classification and reconstruction tasks, observing consistent latent similarity and downstream performance improvements in a zero-shot stitching setting.
arXiv Detail & Related papers (2023-10-02T13:55:38Z)
Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning [70.52689048213398]
This paper studies the performance of overfitted meta-learning under a linear regression model with Gaussian features. We find new and interesting properties that do not exist in single-task linear regression. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large.
arXiv Detail & Related papers (2023-04-09T20:36:13Z)
Analyzing the Performance of Deep Encoder-Decoder Networks as Surrogates for a Diffusion Equation [0.0]
We study the use of encoder-decoder convolutional neural network (CNN) as surrogates for steady-state diffusion solvers. Our results indicate that increasing the size of the training set has a substantial effect on reducing performance fluctuations and overall error.
arXiv Detail & Related papers (2023-02-07T22:53:19Z)
Neural Architecture for Online Ensemble Continual Learning [6.241435193861262]
We present a fully differentiable ensemble method that allows us to efficiently train an ensemble of neural networks in the end-to-end regime. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.
arXiv Detail & Related papers (2022-11-27T23:17:08Z)
Invariant Causal Mechanisms through Distribution Matching [86.07327840293894]
In this work we provide a causal perspective and a new algorithm for learning invariant representations. Empirically we show that this algorithm works well on a diverse set of tasks and in particular we observe state-of-the-art performance on domain generalization.
arXiv Detail & Related papers (2022-06-23T12:06:54Z)
Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss. Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z)
Meta-learning One-class Classifiers with Eigenvalue Solvers for Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection. We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z)
Analyzing Overfitting under Class Imbalance in Neural Networks for Image Segmentation [19.259574003403998]
In image segmentation neural networks may overfit to the foreground samples from small structures. In this study, we provide new insights on the problem of overfitting under class imbalance by inspecting the network behavior.
arXiv Detail & Related papers (2021-02-20T14:57:58Z)
Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks. In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings. We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z)
Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets. Part of the challenge of learning robust models lies in the influence of unobserved confounders. We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.