Kolmogorov GAM Networks are all you need!
- URL: http://arxiv.org/abs/2501.00704v1
- Date: Wed, 01 Jan 2025 02:46:00 GMT
- Title: Kolmogorov GAM Networks are all you need!
- Authors: Sarah Polson, Vadim Sokolov,
- Abstract summary: Kolmogorov GAM networks are shown to be an efficient architecture for training and inference.
They are an additive model with an embedding that is independent of the function of interest.
- Score: 0.6906005491572398
- License:
- Abstract: Kolmogorov GAM (K-GAM) networks are shown to be an efficient architecture for training and inference. They are an additive model with an embedding that is independent of the function of interest. They provide an alternative to the transformer architecture. They are the machine learning version of Kolmogorov's Superposition Theorem (KST) which provides an efficient representations of a multivariate function. Such representations have use in machine learning for encoding dictionaries (a.k.a. "look-up" tables). KST theory also provides a representation based on translates of the K\"oppen function. The goal of our paper is to interpret this representation in a machine learning context for applications in Artificial Intelligence (AI). Our architecture is equivalent to a topological embedding which is independent of the function together with an additive layer that uses a Generalized Additive Model (GAM). This provides a class of learning procedures with far fewer parameters than current deep learning algorithms. Implementation can be parallelizable which makes our algorithms computationally attractive. To illustrate our methodology, we use the Iris data from statistical learning. We also show that our additive model with non-linear embedding provides an alternative to transformer architectures which from a statistical viewpoint are kernel smoothers. Additive KAN models therefore provide a natural alternative to transformers. Finally, we conclude with directions for future research.
Related papers
- GRIL: A $2$-parameter Persistence Based Vectorization for Machine
Learning [0.49703640686206074]
We introduce a novel vector representation called Generalized Rank Invariant Landscape (GRIL) for $2$- parameter persistence modules.
We show that this vector representation is $1$-Lipschitz stable and differentiable with respect to underlying filtration functions.
We also observe an increase in performance indicating that GRIL can capture additional features enriching Graph Neural Networks (GNNs)
arXiv Detail & Related papers (2023-04-11T04:30:58Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - uGLAD: Sparse graph recovery by optimizing deep unrolled networks [11.48281545083889]
We present a novel technique to perform sparse graph recovery by optimizing deep unrolled networks.
Our model, uGLAD, builds upon and extends the state-of-the-art model GLAD to the unsupervised setting.
We evaluate model results on synthetic Gaussian data, non-Gaussian data generated from Gene Regulatory Networks, and present a case study in anaerobic digestion.
arXiv Detail & Related papers (2022-05-23T20:20:27Z) - Statistically Meaningful Approximation: a Case Study on Approximating
Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability.
We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z) - Understanding Dynamics of Nonlinear Representation Learning and Its
Application [12.697842097171119]
We study the dynamics of implicit nonlinear representation learning.
We show that the data-architecture alignment condition is sufficient for the global convergence.
We derive a new training framework, which satisfies the data-architecture alignment condition without assuming it.
arXiv Detail & Related papers (2021-06-28T16:31:30Z) - Learning outside the Black-Box: The pursuit of interpretable models [78.32475359554395]
This paper proposes an algorithm that produces a continuous global interpretation of any given continuous black-box function.
Our interpretation represents a leap forward from the previous state of the art.
arXiv Detail & Related papers (2020-11-17T12:39:44Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Tensor Relational Algebra for Machine Learning System Design [7.764107702934616]
We present an alternative implementation abstraction called the relational tensor algebra (TRA)
TRA is a set-based algebra based on the relational algebra.
Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML in distributed clusters.
arXiv Detail & Related papers (2020-09-01T15:51:24Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies [60.285091454321055]
We design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix.
On text classification, language modeling, and movie recommendation benchmarks, we show that ANT is particularly suitable for large vocabulary sizes.
arXiv Detail & Related papers (2020-03-18T13:07:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.