Related papers: TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization

TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization

URL: http://arxiv.org/abs/2303.05506v1
Date: Thu, 9 Mar 2023 18:57:13 GMT
Title: TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization
Authors: Alan Jeffares, Tennison Liu, Jonathan Crabb\'e, Fergus Imrie, Mihaela van der Schaar
Abstract summary: We introduce Tabular Neural Gradient Orthogonalization and gradient (TANGOS) TANGOS is a novel framework for regularization in the tabular setting built on latent unit attributions. We demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods.
Score: 69.80141512683254
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their success with unstructured data, deep neural networks are not yet a panacea for structured tabular data. In the tabular domain, their efficiency crucially relies on various forms of regularization to prevent overfitting and provide strong generalization performance. Existing regularization techniques include broad modelling decisions such as choice of architecture, loss functions, and optimization methods. In this work, we introduce Tabular Neural Gradient Orthogonalization and Specialization (TANGOS), a novel framework for regularization in the tabular setting built on latent unit attributions. The gradient attribution of an activation with respect to a given input feature suggests how the neuron attends to that feature, and is often employed to interpret the predictions of deep networks. In TANGOS, we take a different approach and incorporate neuron attributions directly into training to encourage orthogonalization and specialization of latent attributions in a fully-connected network. Our regularizer encourages neurons to focus on sparse, non-overlapping input features and results in a set of diverse and specialized latent units. In the tabular domain, we demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods. We provide insight into why our regularizer is effective and demonstrate that TANGOS can be applied jointly with existing methods to achieve even greater generalization performance.

Related papers

Detecting and Pruning Prominent but Detrimental Neurons in Large Language Models [68.57424628540907]
Large language models (LLMs) often develop learned mechanisms specialized to specific datasets.<n>We introduce a fine-tuning approach designed to enhance generalization by identifying and pruning neurons associated with dataset-specific mechanisms.<n>Our method employs Integrated Gradients to quantify each neuron's influence on high-confidence predictions, pinpointing those that disproportionately contribute to dataset-specific performance.
arXiv Detail & Related papers (2025-07-12T08:10:10Z)
Lattice-Based Pruning in Recurrent Neural Networks via Poset Modeling [0.0]
Recurrent neural networks (RNNs) are central to sequence modeling tasks, yet their high computational complexity poses challenges for scalability and real-time deployment. We introduce a novel framework that models RNNs as partially ordered sets (posets) and constructs corresponding dependency lattices. By identifying meet irreducible neurons, our lattice-based pruning algorithm selectively retains critical connections while eliminating redundant ones.
arXiv Detail & Related papers (2025-02-23T10:11:38Z)
Back to Bayesics: Uncovering Human Mobility Distributions and Anomalies with an Integrated Statistical and Neural Framework [14.899157568336731]
DeepBayesic is a novel framework that integrates Bayesian principles with deep neural networks to model the underlying distributions. We evaluate our approach on several mobility datasets, demonstrating significant improvements over state-of-the-art anomaly detection methods.
arXiv Detail & Related papers (2024-10-01T19:02:06Z)
Joint Diffusion Processes as an Inductive Bias in Sheaf Neural Networks [14.224234978509026]
Sheaf Neural Networks (SNNs) naturally extend Graph Neural Networks (GNNs) We propose two novel sheaf learning approaches that provide a more intuitive understanding of the involved structure maps. In our evaluation, we show the limitations of the real-world benchmarks used so far on SNNs.
arXiv Detail & Related papers (2024-07-30T07:17:46Z)
Function-Space Regularization in Neural Networks: A Probabilistic Perspective [51.133793272222874]
We show that we can derive a well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training. We evaluate the utility of this regularization technique empirically and demonstrate that the proposed method leads to near-perfect semantic shift detection and highly-calibrated predictive uncertainty estimates.
arXiv Detail & Related papers (2023-12-28T17:50:56Z)
Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks [77.89179552509887]
We propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. We exhaustively show the effectiveness of this method for uncertainty estimation and generalization.
arXiv Detail & Related papers (2023-07-15T09:24:33Z)
Evolving Neural Selection with Adaptive Regularization [7.298440208725654]
We show a method in which the selection of neurons in deep neural networks evolves, adapting to the difficulty of prediction. We propose the Adaptive Neural Selection (ANS) framework, which evolves to weigh neurons in a layer to form network variants. Experimental results show that the proposed method can significantly improve the performance of commonly-used neural network architectures.
arXiv Detail & Related papers (2022-04-04T17:19:52Z)
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks [18.377136391055327]
This paper theoretically analyzes the implicit regularization in hierarchical tensor factorization. It translates to an implicit regularization towards locality for the associated convolutional networks. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.
arXiv Detail & Related papers (2022-01-27T18:48:30Z)
Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation [65.79387438988554]
Lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application. We propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG) We find that both the richer dark knowledge" from the teacher network, as well as the gradient filter we proposed, can reduce the difficulty of learning the mapping.
arXiv Detail & Related papers (2021-07-06T14:08:54Z)
Learning for Integer-Constrained Optimization through Neural Networks with Limited Training [28.588195947764188]
We introduce a symmetric and decomposed neural network structure, which is fully interpretable in terms of the functionality of its constituent components. By taking advantage of the underlying pattern of the integer constraint, the introduced neural network offers superior generalization performance with limited training. We show that the introduced decomposed approach can be further extended to semi-decomposed frameworks.
arXiv Detail & Related papers (2020-11-10T21:17:07Z)
On Connections between Regularizations for Improving DNN Robustness [67.28077776415724]
This paper analyzes regularization terms proposed recently for improving the adversarial robustness of deep neural networks (DNNs) We study possible connections between several effective methods, including input-gradient regularization, Jacobian regularization, curvature regularization, and a cross-Lipschitz functional.
arXiv Detail & Related papers (2020-07-04T23:43:32Z)
Understanding Generalization in Deep Learning via Tensor Methods [53.808840694241]
We advance the understanding of the relations between the network's architecture and its generalizability from the compression perspective. We propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks.
arXiv Detail & Related papers (2020-01-14T22:26:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.