Analysis of Regularized Learning for Linear-functional Data in Banach
Spaces
- URL: http://arxiv.org/abs/2109.03159v6
- Date: Tue, 8 Aug 2023 02:52:17 GMT
- Title: Analysis of Regularized Learning for Linear-functional Data in Banach
Spaces
- Authors: Qi Ye
- Abstract summary: We study the whole theory of regularized learning for linear-functional data in Banach spaces.
We show the convergence of the approximate solutions to the exact solutions by the weak* topology of the Banach space.
The theorems of the regularized learning are applied to solve many problems of machine learning.
- Score: 3.160070867400839
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In this article, we study the whole theory of regularized learning for
linear-functional data in Banach spaces including representer theorems,
pseudo-approximation theorems, and convergence theorems. The input training
data are composed of linear functionals in the predual space of the Banach
space to represent the discrete local information of multimodel data and
multiscale models. The training data and the multi-loss functions are used to
compute the empirical risks to approximate the expected risks, and the
regularized learning is to minimize the regularized empirical risks over the
Banach spaces. The exact solutions of the original problems are approximated
globally by the regularized learning even if the original problems are unknown
or unformulated. In the convergence theorems, we show the convergence of the
approximate solutions to the exact solutions by the weak* topology of the
Banach space. Moreover, the theorems of the regularized learning are applied to
solve many problems of machine learning such as support vector machines and
artificial neural networks.
Related papers
- Hypothesis Spaces for Deep Learning [7.695772976072261]
This paper introduces a hypothesis space for deep learning that employs deep neural networks (DNNs)
By treating a DNN as a function of two variables, we consider the primitive set of the DNNs for the parameter variable located in a set of the weight matrices and biases determined by a prescribed depth and widths of the DNNs.
We prove that the Banach space so constructed is a kernel reproducing Banach space (RKBS) and construct its reproducing kernel.
arXiv Detail & Related papers (2024-03-05T22:42:29Z) - Modify Training Directions in Function Space to Reduce Generalization
Error [9.821059922409091]
We propose a modified natural gradient descent method in the neural network function space based on the eigendecompositions of neural tangent kernel and Fisher information matrix.
We explicitly derive the generalization error of the learned neural network function using theoretical methods from eigendecomposition and statistics theory.
arXiv Detail & Related papers (2023-07-25T07:11:30Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Sparse Representer Theorems for Learning in Reproducing Kernel Banach
Spaces [7.695772976072261]
Sparsity of a learning solution is a desirable feature in machine learning.
Certain reproducing kernel Banach spaces (RKBSs) are appropriate hypothesis spaces for sparse learning methods.
arXiv Detail & Related papers (2023-05-21T22:36:32Z) - The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning [80.1018596899899]
We argue that neural network models share this same preference, formalized using Kolmogorov complexity.
Our experiments show that pre-trained and even randomly language models prefer to generate low-complexity sequences.
These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.
arXiv Detail & Related papers (2023-04-11T17:22:22Z) - On the existence of global minima and convergence analyses for gradient
descent methods in the training of deep neural networks [3.198144010381572]
We study feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers.
We prove convergence of the risk of the GD optimization method with randoms in the training of such ANNs.
We also study solutions of gradient flow differential equations.
arXiv Detail & Related papers (2021-12-17T18:55:40Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Generalisation error in learning with random features and the hidden
manifold model [23.71637173968353]
We study generalised linear regression and classification for a synthetically generated dataset.
We consider the high-dimensional regime and using the replica method from statistical physics.
We show how to obtain the so-called double descent behaviour for logistic regression with a peak at the threshold.
We discuss the role played by correlations in the data generated by the hidden manifold model.
arXiv Detail & Related papers (2020-02-21T14:49:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.