CS4ML: A general framework for active learning with arbitrary data based
on Christoffel functions
- URL: http://arxiv.org/abs/2306.00945v2
- Date: Thu, 7 Dec 2023 23:43:12 GMT
- Title: CS4ML: A general framework for active learning with arbitrary data based
on Christoffel functions
- Authors: Ben Adcock, Juan M. Cardenas, Nick Dexter
- Abstract summary: We introduce a general framework for active learning in regression problems.
Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces.
This paper focuses on applications in scientific computing, where active learning is often desirable.
- Score: 0.7366405857677226
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a general framework for active learning in regression problems.
Our framework extends the standard setup by allowing for general types of data,
rather than merely pointwise samples of the target function. This
generalization covers many cases of practical interest, such as data acquired
in transform domains (e.g., Fourier data), vector-valued data (e.g.,
gradient-augmented data), data acquired along continuous curves, and,
multimodal data (i.e., combinations of different types of measurements). Our
framework considers random sampling according to a finite number of sampling
measures and arbitrary nonlinear approximation spaces (model classes). We
introduce the concept of generalized Christoffel functions and show how these
can be used to optimize the sampling measures. We prove that this leads to
near-optimal sample complexity in various important cases. This paper focuses
on applications in scientific computing, where active learning is often
desirable, since it is usually expensive to generate data. We demonstrate the
efficacy of our framework for gradient-augmented learning with polynomials,
Magnetic Resonance Imaging (MRI) using generative models and adaptive sampling
for solving PDEs using Physics-Informed Neural Networks (PINNs).
Related papers
- Optimal sampling for least-squares approximation [0.8702432681310399]
We introduce the Christoffel function as a key quantity in the analysis of (weighted) least-squares approximation from random samples.
We show how it can be used to construct sampling strategies that possess near-optimal sample complexity.
arXiv Detail & Related papers (2024-09-04T00:06:23Z) - Approximate Gibbs Sampler for Efficient Inference of Hierarchical Bayesian Models for Grouped Count Data [0.0]
This research develops an approximate Gibbs sampler (AGS) to efficiently learn the HBPRMs while maintaining the inference accuracy.
Numerical experiments using real and synthetic datasets with small and large counts demonstrate the superior performance of AGS.
arXiv Detail & Related papers (2022-11-28T21:00:55Z) - FaDIn: Fast Discretized Inference for Hawkes Processes with General
Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support.
The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG)
Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Efficient Multidimensional Functional Data Analysis Using Marginal
Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data.
We show that the resulting estimation problem can be solved efficiently by the tensor decomposition.
We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Distributed Learning via Filtered Hyperinterpolation on Manifolds [2.2046162792653017]
This paper studies the problem of learning real-valued functions on manifold.
Motivated by the problem of handling large data sets, it presents a parallel data processing approach.
We prove quantitative relations between the approximation quality of the learned function over the entire manifold.
arXiv Detail & Related papers (2020-07-18T10:05:18Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.