Related papers: Latent Variable Method Demonstrator -- Software for Understanding Multivariate Data Analytics Algorithms

Latent Variable Method Demonstrator -- Software for Understanding Multivariate Data Analytics Algorithms

URL: http://arxiv.org/abs/2205.08132v2
Date: Mon, 26 Sep 2022 15:56:52 GMT
Title: Latent Variable Method Demonstrator -- Software for Understanding Multivariate Data Analytics Algorithms
Authors: Joachim Schaeffer and Richard Braatz
Abstract summary: This article describes interactive software - the Latent Variable Demonstrator (LAVADE) - for teaching, learning, and understanding latent variable methods. Users can interactively compare latent variable methods such as Partial Least Squares (PLS), and Principal Component Regression (PCR) with other regression methods. The software contains a data generation method and three chemical process datasets, allowing for comparing results of datasets with different levels of complexity.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The ever-increasing quantity of multivariate process data is driving a need for skilled engineers to analyze, interpret, and build models from such data. Multivariate data analytics relies heavily on linear algebra, optimization, and statistics and can be challenging for students to understand given that most curricula do not have strong coverage in the latter three topics. This article describes interactive software - the Latent Variable Demonstrator (LAVADE) - for teaching, learning, and understanding latent variable methods. In this software, users can interactively compare latent variable methods such as Partial Least Squares (PLS), and Principal Component Regression (PCR) with other regression methods such as Least Absolute Shrinkage and Selection Operator (lasso), Ridge Regression (RR), and Elastic Net (EN). LAVADE helps to build intuition on choosing appropriate methods, hyperparameter tuning, and model coefficient interpretation, fostering a conceptual understanding of the algorithms' differences. The software contains a data generation method and three chemical process datasets, allowing for comparing results of datasets with different levels of complexity. LAVADE is released as open-source software so that others can apply and advance the tool for use in teaching or research.

Related papers

Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP [1.331812695405053]
This paper dives into the hurdles of analyzing high-dimensional data, especially when it gets too complex. Traditional methods in data analysis often look at direct connections between input variables, which can miss out on the more complicated relationships within the data. We consider the role of synthetic data and how information can sometimes be redundant across different sensors.
arXiv Detail & Related papers (2025-04-02T21:53:03Z)
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
Multi-Task Learning with Summary Statistics [4.871473117968554]
We propose a flexible multi-task learning framework utilizing summary statistics from various sources. We also present an adaptive parameter selection approach based on a variant of Lepski's method. This work offers a more flexible tool for training related models across various domains, with practical implications in genetic risk prediction.
arXiv Detail & Related papers (2023-07-05T15:55:23Z)
Constructing Effective Machine Learning Models for the Sciences: A Multidisciplinary Perspective [77.53142165205281]
We show how flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models. We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models.
arXiv Detail & Related papers (2022-11-21T17:48:44Z)
Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data. Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z)
Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks. Data augmentation induces these symmetries during training by applying multiple transformations to the input data. This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI. This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data. Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z)
Safe Active Learning for Multi-Output Gaussian Processes [6.0803541683577444]
We propose a safe active learning approach for multi-output Gaussian process regression. This approach queries the most informative data or output taking the relatedness between the regressors and safety constraints into account.
arXiv Detail & Related papers (2022-03-28T15:41:48Z)
Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems. We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z)
Learning Time-Varying Graphs from Online Data [39.21234914444073]
This work proposes an algorithmic framework to learn time-varying graphs from online data. It renders it model-independent, i.e., it can be theoretically analyzed in its abstract formulation. We specialize the framework to three well-known graph learning models, namely, the Gaussian graphical model (GGM), the structural equation model (SEM), and the smoothness-based model (SBM)
arXiv Detail & Related papers (2021-10-21T09:46:44Z)
Scalable Gaussian Processes for Data-Driven Design using Big Data with Categorical Factors [14.337297795182181]
Gaussian processes (GP) have difficulties in accommodating big datasets, categorical inputs, and multiple responses. We propose a GP model that utilizes latent variables and functions obtained through variational inference to address the aforementioned challenges simultaneously. Our approach is demonstrated for machine learning of ternary oxide materials and topology optimization of a multiscale compliant mechanism.
arXiv Detail & Related papers (2021-06-26T02:17:23Z)
Enhancing ensemble learning and transfer learning in multimodal data analysis by adaptive dimensionality reduction [10.646114896709717]
In multimodal data analysis, not all observations would show the same level of reliability or information quality. We propose an adaptive approach for dimensionality reduction to overcome this issue. We test our approach on multimodal datasets acquired in diverse research fields.
arXiv Detail & Related papers (2021-05-08T11:53:12Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.