Learning Regularities from Data using Spiking Functions: A Theory
- URL: http://arxiv.org/abs/2405.11684v2
- Date: Thu, 6 Jun 2024 15:41:38 GMT
- Title: Learning Regularities from Data using Spiking Functions: A Theory
- Authors: Canlin Zhang, Xiuwen Liu,
- Abstract summary: We propose a new machine learning theory, which defines in mathematics what are regularities.
We say that the discovered non-randomness is encoded into regularities if the function is simple enough.
In this process, we claim that the 'best' regularities, or the optimal spiking functions, are those who can capture the largest amount of information.
- Score: 1.3735277588793995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks trained in an end-to-end manner are proven to be efficient in a wide range of machine learning tasks. However, there is one drawback of end-to-end learning: The learned features and information are implicitly represented in neural network parameters, which cannot be used as regularities, concepts or knowledge to explicitly represent the data probability distribution. To resolve this issue, we propose in this paper a new machine learning theory, which defines in mathematics what are regularities. Briefly, regularities are concise representations of the non-random features, or 'non-randomness' in the data probability distribution. Combining this with information theory, we claim that regularities can also be regarded as a small amount of information encoding a large amount of information. Our theory is based on spiking functions. That is, if a function can react to, or spike on specific data samples more frequently than random noise inputs, we say that such a function discovers non-randomness from the data distribution. Also, we say that the discovered non-randomness is encoded into regularities if the function is simple enough. Our theory also discusses applying multiple spiking functions to the same data distribution. In this process, we claim that the 'best' regularities, or the optimal spiking functions, are those who can capture the largest amount of information from the data distribution, and then encode the captured information in the most concise way. Theorems and hypotheses are provided to describe in mathematics what are 'best' regularities and optimal spiking functions. Finally, we propose a machine learning approach, which can potentially obtain the optimal spiking functions regarding the given dataset in practice.
Related papers
- A theory of data variability in Neural Network Bayesian inference [0.70224924046445]
We provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks.
We derive the generalization properties from the statistical properties of the input.
We show that data variability leads to a non-Gaussian action reminiscent of a ($varphi3+varphi4$)-theory.
arXiv Detail & Related papers (2023-07-31T14:11:32Z) - Utility-Probability Duality of Neural Networks [4.871730595406078]
We propose an alternative utility-based explanation to the standard supervised learning procedure in deep learning.
The basic idea is to interpret the learned neural network not as a probability model but as an ordinal utility function.
We show that for all neural networks with softmax outputs, the SGD learning dynamic of maximum likelihood estimation can be seen as an iteration process.
arXiv Detail & Related papers (2023-05-24T08:09:07Z) - Masked prediction tasks: a parameter identifiability view [49.533046139235466]
We focus on the widely used self-supervised learning method of predicting masked tokens.
We show that there is a rich landscape of possibilities, out of which some prediction tasks yield identifiability, while others do not.
arXiv Detail & Related papers (2022-02-18T17:09:32Z) - Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors [1.8130068086063333]
Stragglers and other failures can severely impact the overall completion time.
Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks.
We show that this strict definition does not directly optimize the probability of failure.
arXiv Detail & Related papers (2022-02-07T19:20:00Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Transformers Can Do Bayesian Inference [56.99390658880008]
We present Prior-Data Fitted Networks (PFNs)
PFNs leverage in-context learning in large-scale machine learning techniques to approximate a large set of posteriors.
We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems.
arXiv Detail & Related papers (2021-12-20T13:07:39Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Function Approximation via Sparse Random Features [23.325877475827337]
This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing.
We show that the sparse random feature method outperforms shallow networks for well-structured functions and applications to scientific machine learning tasks.
arXiv Detail & Related papers (2021-03-04T17:53:54Z) - Deep Archimedean Copulas [98.96141706464425]
ACNet is a novel differentiable neural network architecture that enforces structural properties.
We show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
arXiv Detail & Related papers (2020-12-05T22:58:37Z) - Kernel based analysis of massive data [0.45687771576879593]
We develop a theory of approximation by networks to achieve local, stratified approximation.
The massive nature of the data allows us to use these eignets to solve inverse problems.
We construct pre-fabricated networks so that no data-based training is required for the approximation.
arXiv Detail & Related papers (2020-03-30T05:25:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.