A Principled Method for the Creation of Synthetic Multi-fidelity Data
Sets
- URL: http://arxiv.org/abs/2208.05667v1
- Date: Thu, 11 Aug 2022 06:55:14 GMT
- Title: A Principled Method for the Creation of Synthetic Multi-fidelity Data
Sets
- Authors: Clyde Fare, Peter Fenner, Edward O. Pyzer-Knapp
- Abstract summary: Multifidelity and multioutput optimisation algorithms allow experimental and computational proxies to be used intelligently in the search for optimal species.
Characterisation of these algorithms involves benchmarks that typically either use analytic functions or existing multifidelity datasets.
We present a methodology for systematic generation of synthetic fidelities derived from a reference ground truth function with a controllable degree of correlation.
- Score: 3.512854793379827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multifidelity and multioutput optimisation algorithms are an area of current
interest in many areas of computational design as they allow experimental and
computational proxies to be used intelligently in the search for optimal
species. Characterisation of these algorithms involves benchmarks that
typically either use analytic functions or existing multifidelity datasets.
Unfortunately, existing analytic functions are often not representative of
relevant problems, while many existing datasets are not constructed to easily
allow systematic investigation of the influence of characteristics of the
contained proxies functions. To fulfil this need, we present a methodology for
systematic generation of synthetic fidelities derived from a reference ground
truth function with a controllable degree of correlation.
Related papers
- Low-dimensional Functions are Efficiently Learnable under Randomly Biased Distributions [12.410304632874531]
We prove that introducing a small random perturbation to the data distribution--via a random shift in the first moment--renders any Gaussian single index model as easy to learn as a linear function.
We extend this result to a class of multi index models, namely sparse Boolean functions, also known as Juntas.
arXiv Detail & Related papers (2025-02-10T13:19:30Z) - Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems [0.0]
Synthetic datasets are important for evaluating and testing machine learning models.
We develop a novel framework for generating synthetic datasets that are diverse and statistically coherent.
The framework is available as a free open Python package to facilitate research with minimal friction.
arXiv Detail & Related papers (2024-11-27T09:53:14Z) - Towards stable real-world equation discovery with assessing
differentiating quality influence [52.2980614912553]
We propose alternatives to the commonly used finite differences-based method.
We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms.
arXiv Detail & Related papers (2023-11-09T23:32:06Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - A geometric perspective on functional outlier detection [0.0]
We develop a conceptualization of functional outlier detection that is more widely applicable and realistic than previously proposed.
We show that simple manifold learning methods can be used to reliably infer and visualize the geometric structure of functional data sets.
Our experiments on synthetic and real data sets demonstrate that this approach leads to outlier detection performances at least on par with existing functional data-specific methods.
arXiv Detail & Related papers (2021-09-14T17:42:57Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z) - Efficient Multidimensional Functional Data Analysis Using Marginal
Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data.
We show that the resulting estimation problem can be solved efficiently by the tensor decomposition.
We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z) - Top-$k$ Regularization for Supervised Feature Selection [11.927046591097623]
We introduce a novel, simple yet effective regularization approach, named top-$k$ regularization, to supervised feature selection.
We show that the top-$k$ regularization is effective and stable for supervised feature selection.
arXiv Detail & Related papers (2021-06-04T01:12:47Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z) - Machine Learning to Tackle the Challenges of Transient and Soft Errors
in Complex Circuits [0.16311150636417257]
Machine learning models are used to predict accurate per-instance Functional De-Rating data for the full list of circuit instances.
The presented methodology is applied on a practical example and various machine learning models are evaluated and compared.
arXiv Detail & Related papers (2020-02-18T18:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.