Related papers: Utility-Probability Duality of Neural Networks

Utility-Probability Duality of Neural Networks

URL: http://arxiv.org/abs/2305.14859v2
Date: Thu, 25 May 2023 09:25:24 GMT
Title: Utility-Probability Duality of Neural Networks
Authors: Huang Bojun, Fei Yuan
Abstract summary: We propose an alternative utility-based explanation to the standard supervised learning procedure in deep learning. The basic idea is to interpret the learned neural network not as a probability model but as an ordinal utility function. We show that for all neural networks with softmax outputs, the SGD learning dynamic of maximum likelihood estimation can be seen as an iteration process.
Score: 4.871730595406078
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It is typically understood that the training of modern neural networks is a process of fitting the probability distribution of desired output. However, recent paradoxical observations in a number of language generation tasks let one wonder if this canonical probability-based explanation can really account for the empirical success of deep learning. To resolve this issue, we propose an alternative utility-based explanation to the standard supervised learning procedure in deep learning. The basic idea is to interpret the learned neural network not as a probability model but as an ordinal utility function that encodes the preference revealed in training data. In this perspective, training of the neural network corresponds to a utility learning process. Specifically, we show that for all neural networks with softmax outputs, the SGD learning dynamic of maximum likelihood estimation (MLE) can be seen as an iteration process that optimizes the neural network toward an optimal utility function. This utility-based interpretation can explain several otherwise-paradoxical observations about the neural networks thus trained. Moreover, our utility-based theory also entails an equation that can transform the learned utility values back to a new kind of probability estimation with which probability-compatible decision rules enjoy dramatic (double-digits) performance improvements. These evidences collectively reveal a phenomenon of utility-probability duality in terms of what modern neural networks are (truly) modeling: We thought they are one thing (probabilities), until the unexplainable showed up; changing mindset and treating them as another thing (utility values) largely reconcile the theory, despite remaining subtleties regarding its original (probabilistic) identity.

Related papers

Continual learning with the neural tangent ensemble [0.6137178191238463]
We show that a neural network with N parameters can be interpreted as a weighted ensemble of N classifiers. We derive the likelihood and posterior probability of each expert given past data. Surprisingly, we learn that the posterior updates for these experts are equivalent to a scaled and projected form of gradient descent.
arXiv Detail & Related papers (2024-08-30T16:29:09Z)
Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
From Neurons to Neutrons: A Case Study in Interpretability [5.242869847419834]
We argue that high-dimensional neural networks can learn low-dimensional representations of their training data that are useful beyond simply making good predictions. This indicates that such approaches to interpretability can be useful for deriving a new understanding of a problem from models trained to solve it.
arXiv Detail & Related papers (2024-05-27T17:59:35Z)
Instance-wise Linearization of Neural Network for Model Interpretation [13.583425552511704]
The challenge can dive into the non-linear behavior of the neural network. For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. We propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction.
arXiv Detail & Related papers (2023-10-25T02:07:39Z)
Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective. We show how to compute this efficiently for tractable circuits. We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm. We prove the benefits of unlabeled data in both training convergence and generalization ability. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Robust Generalization of Quadratic Neural Networks via Function Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution. We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z)
The Causal Neural Connection: Expressiveness, Learnability, and Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation. In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models. We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z)
Bayesian Neural Networks [0.0]
We show how errors in prediction by neural networks can be obtained in principle, and provide the two favoured methods for characterising these errors. We will also describe how both of these methods have substantial pitfalls when put into practice.
arXiv Detail & Related papers (2020-06-02T09:43:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.