Latent Variable Method Demonstrator -- Software for Understanding
Multivariate Data Analytics Algorithms
- URL: http://arxiv.org/abs/2205.08132v2
- Date: Mon, 26 Sep 2022 15:56:52 GMT
- Title: Latent Variable Method Demonstrator -- Software for Understanding
Multivariate Data Analytics Algorithms
- Authors: Joachim Schaeffer and Richard Braatz
- Abstract summary: This article describes interactive software - the Latent Variable Demonstrator (LAVADE) - for teaching, learning, and understanding latent variable methods.
Users can interactively compare latent variable methods such as Partial Least Squares (PLS), and Principal Component Regression (PCR) with other regression methods.
The software contains a data generation method and three chemical process datasets, allowing for comparing results of datasets with different levels of complexity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The ever-increasing quantity of multivariate process data is driving a need
for skilled engineers to analyze, interpret, and build models from such data.
Multivariate data analytics relies heavily on linear algebra, optimization, and
statistics and can be challenging for students to understand given that most
curricula do not have strong coverage in the latter three topics. This article
describes interactive software - the Latent Variable Demonstrator (LAVADE) -
for teaching, learning, and understanding latent variable methods. In this
software, users can interactively compare latent variable methods such as
Partial Least Squares (PLS), and Principal Component Regression (PCR) with
other regression methods such as Least Absolute Shrinkage and Selection
Operator (lasso), Ridge Regression (RR), and Elastic Net (EN). LAVADE helps to
build intuition on choosing appropriate methods, hyperparameter tuning, and
model coefficient interpretation, fostering a conceptual understanding of the
algorithms' differences. The software contains a data generation method and
three chemical process datasets, allowing for comparing results of datasets
with different levels of complexity. LAVADE is released as open-source software
so that others can apply and advance the tool for use in teaching or research.
Related papers
- Multi-Task Learning with Summary Statistics [4.871473117968554]
We propose a flexible multi-task learning framework utilizing summary statistics from various sources.
We also present an adaptive parameter selection approach based on a variant of Lepski's method.
This work offers a more flexible tool for training related models across various domains, with practical implications in genetic risk prediction.
arXiv Detail & Related papers (2023-07-05T15:55:23Z) - Constructing Effective Machine Learning Models for the Sciences: A
Multidisciplinary Perspective [77.53142165205281]
We show how flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models.
We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models.
arXiv Detail & Related papers (2022-11-21T17:48:44Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Safe Active Learning for Multi-Output Gaussian Processes [6.0803541683577444]
We propose a safe active learning approach for multi-output Gaussian process regression.
This approach queries the most informative data or output taking the relatedness between the regressors and safety constraints into account.
arXiv Detail & Related papers (2022-03-28T15:41:48Z) - Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Learning Time-Varying Graphs from Online Data [39.21234914444073]
This work proposes an algorithmic framework to learn time-varying graphs from online data.
It renders it model-independent, i.e., it can be theoretically analyzed in its abstract formulation.
We specialize the framework to three well-known graph learning models, namely, the Gaussian graphical model (GGM), the structural equation model (SEM), and the smoothness-based model (SBM)
arXiv Detail & Related papers (2021-10-21T09:46:44Z) - Scalable Gaussian Processes for Data-Driven Design using Big Data with
Categorical Factors [14.337297795182181]
Gaussian processes (GP) have difficulties in accommodating big datasets, categorical inputs, and multiple responses.
We propose a GP model that utilizes latent variables and functions obtained through variational inference to address the aforementioned challenges simultaneously.
Our approach is demonstrated for machine learning of ternary oxide materials and topology optimization of a multiscale compliant mechanism.
arXiv Detail & Related papers (2021-06-26T02:17:23Z) - Enhancing ensemble learning and transfer learning in multimodal data
analysis by adaptive dimensionality reduction [10.646114896709717]
In multimodal data analysis, not all observations would show the same level of reliability or information quality.
We propose an adaptive approach for dimensionality reduction to overcome this issue.
We test our approach on multimodal datasets acquired in diverse research fields.
arXiv Detail & Related papers (2021-05-08T11:53:12Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.