Kernel based analysis of massive data
- URL: http://arxiv.org/abs/2003.13226v2
- Date: Tue, 7 Jul 2020 05:33:34 GMT
- Title: Kernel based analysis of massive data
- Authors: Hrushikesh N Mhaskar
- Abstract summary: We develop a theory of approximation by networks to achieve local, stratified approximation.
The massive nature of the data allows us to use these eignets to solve inverse problems.
We construct pre-fabricated networks so that no data-based training is required for the approximation.
- Score: 0.45687771576879593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dealing with massive data is a challenging task for machine learning. An
important aspect of machine learning is function approximation. In the context
of massive data, some of the commonly used tools for this purpose are sparsity,
divide-and-conquer, and distributed learning. In this paper, we develop a very
general theory of approximation by networks, which we have called eignets, to
achieve local, stratified approximation. The very massive nature of the data
allows us to use these eignets to solve inverse problems such as finding a good
approximation to the probability law that governs the data, and finding the
local smoothness of the target function near different points in the domain. In
fact, we develop a wavelet-like representation using our eignets. Our theory is
applicable to approximation on a general locally compact metric measure space.
Special examples include approximation by periodic basis functions on the
torus, zonal function networks on a Euclidean sphere (including smooth ReLU
networks), Gaussian networks, and approximation on manifolds. We construct
pre-fabricated networks so that no data-based training is required for the
approximation.
Related papers
- Learning Regularities from Data using Spiking Functions: A Theory [1.3735277588793995]
We propose a new machine learning theory, which defines in mathematics what are regularities.
We say that the discovered non-randomness is encoded into regularities if the function is simple enough.
In this process, we claim that the 'best' regularities, or the optimal spiking functions, are those who can capture the largest amount of information.
arXiv Detail & Related papers (2024-05-19T22:04:11Z) - Learning on manifolds without manifold learning [0.0]
Function approximation based on data drawn randomly from an unknown distribution is an important problem in machine learning.
In this paper, we project the unknown manifold as a submanifold ambient hypersphere and study the question of constructing a one-shot approximation using specially designed kernels on the hypersphere.
arXiv Detail & Related papers (2024-02-20T03:27:53Z) - SMaRt: Improving GANs with Score Matching Regularity [94.81046452865583]
Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex.
We show that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold.
We propose to improve the optimization of GANs with score matching regularity (SMaRt)
arXiv Detail & Related papers (2023-11-30T03:05:14Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Efficient Parametric Approximations of Neural Network Function Space
Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks.
We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z) - A Theoretical View on Sparsely Activated Networks [21.156069843782017]
We present a formal model of data-dependent sparse networks that captures salient aspects of popular architectures.
We then introduce a routing function based on locality sensitive hashing (LSH) that enables us to reason about how well sparse networks approximate target functions.
We prove that sparse networks can match the approximation power of dense networks on Lipschitz functions.
arXiv Detail & Related papers (2022-08-08T23:14:48Z) - Inducing Gaussian Process Networks [80.40892394020797]
We propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points.
The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains.
We report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods.
arXiv Detail & Related papers (2022-04-21T05:27:09Z) - Deep Archimedean Copulas [98.96141706464425]
ACNet is a novel differentiable neural network architecture that enforces structural properties.
We show that ACNet is able to both approximate common Archimedean Copulas and generate new copulas which may provide better fits to data.
arXiv Detail & Related papers (2020-12-05T22:58:37Z) - A Point-Cloud Deep Learning Framework for Prediction of Fluid Flow
Fields on Irregular Geometries [62.28265459308354]
Network learns end-to-end mapping between spatial positions and CFD quantities.
Incompress laminar steady flow past a cylinder with various shapes for its cross section is considered.
Network predicts the flow fields hundreds of times faster than our conventional CFD.
arXiv Detail & Related papers (2020-10-15T12:15:02Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Distributed Learning via Filtered Hyperinterpolation on Manifolds [2.2046162792653017]
This paper studies the problem of learning real-valued functions on manifold.
Motivated by the problem of handling large data sets, it presents a parallel data processing approach.
We prove quantitative relations between the approximation quality of the learned function over the entire manifold.
arXiv Detail & Related papers (2020-07-18T10:05:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.