Conditional mean embeddings and optimal feature selection via positive
definite kernels
- URL: http://arxiv.org/abs/2305.08100v1
- Date: Sun, 14 May 2023 08:29:15 GMT
- Title: Conditional mean embeddings and optimal feature selection via positive
definite kernels
- Authors: Palle E.T. Jorgensen, Myung-Sin Song, and James Tian
- Abstract summary: We consider operator theoretic approaches to Conditional Conditional embeddings (CME)
Our results combine a spectral analysis-based optimization scheme with the use of kernels, processes, and constructive learning algorithms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by applications, we consider here new operator theoretic approaches
to Conditional mean embeddings (CME). Our present results combine a spectral
analysis-based optimization scheme with the use of kernels, stochastic
processes, and constructive learning algorithms. For initially given non-linear
data, we consider optimization-based feature selections. This entails the use
of convex sets of positive definite (p.d.) kernels in a construction of optimal
feature selection via regression algorithms from learning models. Thus, with
initial inputs of training data (for a suitable learning algorithm,) each
choice of p.d. kernel $K$ in turn yields a variety of Hilbert spaces and
realizations of features. A novel idea here is that we shall allow an
optimization over selected sets of kernels $K$ from a convex set $C$ of
positive definite kernels $K$. Hence our \textquotedblleft
optimal\textquotedblright{} choices of feature representations will depend on a
secondary optimization over p.d. kernels $K$ within a specified convex set $C$.
Related papers
- Global Optimization of Gaussian Process Acquisition Functions Using a Piecewise-Linear Kernel Approximation [2.3342885570554652]
We introduce a piecewise approximation for process kernels and a corresponding MIQP representation for acquisition functions.
We empirically demonstrate the framework on synthetic functions, constrained benchmarks, and hyper tuning tasks.
arXiv Detail & Related papers (2024-10-22T10:56:52Z) - Column and row subset selection using nuclear scores: algorithms and theory for Nyström approximation, CUR decomposition, and graph Laplacian reduction [0.0]
We develop unified methodologies for fast, efficient, and theoretically guaranteed column selection.
First, we derive and implement a sparsity-exploiting deterministic algorithm applicable to tasks including kernel approximation and CUR decomposition.
Next, we develop a matrix-free formalism relying on a randomization scheme satisfying guaranteed concentration bounds.
arXiv Detail & Related papers (2024-07-01T18:10:19Z) - Efficient Convex Algorithms for Universal Kernel Learning [46.573275307034336]
An ideal set of kernels should: admit a linear parameterization (for tractability); dense in the set of all kernels (for accuracy)
Previous algorithms for optimization of kernels were limited to classification and relied on computationally complex Semidefinite Programming (SDP) algorithms.
We propose a SVD-QCQPQP algorithm which dramatically reduces the computational complexity as compared with previous SDP-based approaches.
arXiv Detail & Related papers (2023-04-15T04:57:37Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Scalable First-Order Bayesian Optimization via Structured Automatic
Differentiation [4.061135251278187]
We show that a wide range of kernels gives rise to structured matrices, enabling an exact $mathcalO(n2d)$ matrix-vector multiply for gradient observations and $mathcalO(n2d2)$ for Hessian observations.
Our methods apply to virtually all canonical kernels and automatically extend to complex kernels, like the neural network, radial basis function network, and spectral mixture kernels.
arXiv Detail & Related papers (2022-06-16T17:59:48Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - Non-Convex Optimization with Certificates and Fast Rates Through Kernel
Sums of Squares [68.8204255655161]
We consider potentially non- optimized approximation problems.
In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees.
arXiv Detail & Related papers (2022-04-11T09:37:04Z) - Taming Nonconvexity in Kernel Feature Selection---Favorable Properties
of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection.
The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z) - Submodular + Concave [53.208470310734825]
It has been well established that first order optimization methods can converge to the maximal objective value of concave functions.
In this work, we initiate the determinant of the smooth functions convex body $$F(x) = G(x) +C(x)$.
This class of functions is an extension of both concave and continuous DR-submodular functions for which no guarantee is known.
arXiv Detail & Related papers (2021-06-09T01:59:55Z) - Sequential Subspace Search for Functional Bayesian Optimization
Incorporating Experimenter Intuition [63.011641517977644]
Our algorithm generates a sequence of finite-dimensional random subspaces of functional space spanned by a set of draws from the experimenter's Gaussian Process.
Standard Bayesian optimisation is applied on each subspace, and the best solution found used as a starting point (origin) for the next subspace.
We test our algorithm in simulated and real-world experiments, namely blind function matching, finding the optimal precipitation-strengthening function for an aluminium alloy, and learning rate schedule optimisation for deep networks.
arXiv Detail & Related papers (2020-09-08T06:54:11Z) - Incorporating Expert Prior in Bayesian Optimisation via Space Warping [54.412024556499254]
In big search spaces the algorithm goes through several low function value regions before reaching the optimum of the function.
One approach to subside this cold start phase is to use prior knowledge that can accelerate the optimisation.
In this paper, we represent the prior knowledge about the function optimum through a prior distribution.
The prior distribution is then used to warp the search space in such a way that space gets expanded around the high probability region of function optimum and shrinks around low probability region of optimum.
arXiv Detail & Related papers (2020-03-27T06:18:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.