Related papers: Label-Based Diversity Measure Among Hidden Units of Deep Neural Networks: A Regularization Method

Label-Based Diversity Measure Among Hidden Units of Deep Neural Networks: A Regularization Method

URL: http://arxiv.org/abs/2009.09161v2
Date: Sat, 3 Apr 2021 12:32:54 GMT
Title: Label-Based Diversity Measure Among Hidden Units of Deep Neural Networks: A Regularization Method
Authors: Chenguang Zhang and Yuexian Hou and Dawei Song and Liangzhu Ge and Yaoshuai Yao
Abstract summary: We introduce a new definition of redundancy to describe the diversity of hidden units under supervised learning settings. We prove an opposite relationship between the defined redundancy and the generalization capacity. Experiments show that the DNNs using the redundancy as the regularizer can effectively reduce the overfitting and decrease the generalization error.
Score: 18.72270439152708
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although the deep structure guarantees the powerful expressivity of deep networks (DNNs), it also triggers serious overfitting problem. To improve the generalization capacity of DNNs, many strategies were developed to improve the diversity among hidden units. However, most of these strategies are empirical and heuristic in absence of either a theoretical derivation of the diversity measure or a clear connection from the diversity to the generalization capacity. In this paper, from an information theoretic perspective, we introduce a new definition of redundancy to describe the diversity of hidden units under supervised learning settings by formalizing the effect of hidden layers on the generalization capacity as the mutual information. We prove an opposite relationship existing between the defined redundancy and the generalization capacity, i.e., the decrease of redundancy generally improving the generalization capacity. The experiments show that the DNNs using the redundancy as the regularizer can effectively reduce the overfitting and decrease the generalization error, which well supports above points.

Related papers

On Universality of Deep Equivariant Networks [23.16940006451027]
Universality results for equivariant neural networks remain rare.<n>We show that with sufficient depth or with the addition of appropriate readout layers, equivariant networks attain universality within the entry-wise separable regime.
arXiv Detail & Related papers (2025-10-17T16:51:31Z)
Generalizability of Neural Networks Minimizing Empirical Risk Based on Expressive Ability [20.371836553400232]
This paper investigates the generalizability of neural networks that minimize or approximately minimize empirical risk. We provide theoretical insights into several phenomena in deep learning, including robust generalization.
arXiv Detail & Related papers (2025-03-06T05:36:35Z)
Deep Modularity Networks with Diversity--Preserving Regularization [4.659251704980846]
We propose Deep Modularity Networks with Diversity-Preserving Regularization (DMoN-DPR), which introduces three novel regularization terms: distance-based for inter-cluster separation, variance-based for intra-cluster diversity, and entropy-based for balanced assignments. Our method enhances clustering performance on benchmark datasets, achieving significant improvements in Normalized Mutual Information (NMI), and F1 scores. These results demonstrate the effectiveness of incorporating diversity-preserving regularizations in creating meaningful and interpretable clusters, especially in feature-rich datasets.
arXiv Detail & Related papers (2025-01-23T08:05:59Z)
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers [24.160975100349376]
We propose DASH - a learning algorithm that promotes diversity and flatness within deep ensembles. We provide a theoretical backbone for our method along with extensive empirical evidence demonstrating an improvement in ensemble generalizability.
arXiv Detail & Related papers (2024-03-19T23:50:11Z)
Rethinking Multi-domain Generalization with A General Learning Objective [19.28143363034362]
Multi-domain generalization (mDG) is universally aimed to minimize discrepancy between training and testing distributions. Existing mDG literature lacks a general learning objective paradigm. We propose to leverage a $Y$-mapping to relax the constraint.
arXiv Detail & Related papers (2024-02-29T05:00:30Z)
Towards Improving Robustness Against Common Corruptions using Mixture of Class Specific Experts [10.27974860479791]
This paper introduces a novel paradigm known as the Mixture of Class-Specific Expert Architecture. The proposed architecture aims to mitigate vulnerabilities associated with common neural network structures.
arXiv Detail & Related papers (2023-11-16T20:09:47Z)
TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization [69.80141512683254]
We introduce Tabular Neural Gradient Orthogonalization and gradient (TANGOS) TANGOS is a novel framework for regularization in the tabular setting built on latent unit attributions. We demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods.
arXiv Detail & Related papers (2023-03-09T18:57:13Z)
Calibrated Feature Decomposition for Generalizable Person Re-Identification [82.64133819313186]
Calibrated Feature Decomposition (CFD) module focuses on improving the generalization capacity for person re-identification. A calibrated-and-standardized Batch normalization (CSBN) is designed to learn calibrated person representation.
arXiv Detail & Related papers (2021-11-27T17:12:43Z)
Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation [65.79387438988554]
Lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application. We propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG) We find that both the richer dark knowledge" from the teacher network, as well as the gradient filter we proposed, can reduce the difficulty of learning the mapping.
arXiv Detail & Related papers (2021-07-06T14:08:54Z)
A Too-Good-to-be-True Prior to Reduce Shortcut Reliance [0.19573380763700707]
Deep convolutional neural networks (DCNNs) often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on "shortcuts" We implement this inductive bias in a two-stage approach that uses predictions from a low-capacity network to inform the training of a high-capacity network.
arXiv Detail & Related papers (2021-02-12T09:17:24Z)
DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation [109.11580756757611]
Deep ensembles perform better than a single network thanks to the diversity among their members. Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances. We introduce a novel training criterion called DICE: it increases diversity by reducing spurious correlations among features.
arXiv Detail & Related papers (2021-01-14T10:53:26Z)
Dual-constrained Deep Semi-Supervised Coupled Factorization Network with Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net. To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network. Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z)
Style Normalization and Restitution for Generalizable Person Re-identification [89.482638433932]
We design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains. We propose a simple yet effective Style Normalization and Restitution (SNR) module. Our models empowered by the SNR modules significantly outperform the state-of-the-art domain generalization approaches on multiple widely-used person ReID benchmarks.
arXiv Detail & Related papers (2020-05-22T07:15:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.