Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP
- URL: http://arxiv.org/abs/2504.02151v1
- Date: Wed, 02 Apr 2025 21:53:03 GMT
- Title: Multivariate Temporal Regression at Scale: A Three-Pillar Framework Combining ML, XAI, and NLP
- Authors: Jiztom Kavalakkatt Francis, Matthew J Darr,
- Abstract summary: This paper dives into the hurdles of analyzing high-dimensional data, especially when it gets too complex.<n>Traditional methods in data analysis often look at direct connections between input variables, which can miss out on the more complicated relationships within the data.<n>We consider the role of synthetic data and how information can sometimes be redundant across different sensors.
- Score: 1.331812695405053
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid use of artificial intelligence (AI) in processes such as coding, image processing, and data prediction means it is crucial to understand and validate the data we are working with fully. This paper dives into the hurdles of analyzing high-dimensional data, especially when it gets too complex. Traditional methods in data analysis often look at direct connections between input variables, which can miss out on the more complicated relationships within the data. To address these issues, we explore several tested techniques, such as removing specific variables to see their impact and using statistical analysis to find connections between multiple variables. We also consider the role of synthetic data and how information can sometimes be redundant across different sensors. These analyses are typically very computationally demanding and often require much human effort to make sense of the results. A common approach is to treat the entire dataset as one unit and apply advanced models to handle it. However, this can become problematic with larger, noisier datasets and more complex models. So, we suggest methods to identify overall patterns that can help with tasks like classification or regression based on the idea that more straightforward approaches might be more understandable. Our research looks at two datasets: a real-world dataset and a synthetic one. The goal is to create a methodology that highlights key features on a global scale that lead to predictions, making it easier to validate or quantify the data set. By reducing the dimensionality with this method, we can simplify the models used and thus clarify the insights we gain. Furthermore, our method can reveal unexplored relationships between specific inputs and outcomes, providing a way to validate these new connections further.
Related papers
- Explaining Categorical Feature Interactions Using Graph Covariance and LLMs [18.44675735926458]
This paper focuses on the global synthetic dataset from the Counter Trafficking Data Collaborative.<n>It contains over 200,000 anonymized records spanning from 2002 to 2022 with numerous categorical features for each record.<n>We propose a fast and scalable method for analyzing and extracting significant categorical feature interactions.
arXiv Detail & Related papers (2025-01-24T21:41:26Z) - D3A-TS: Denoising-Driven Data Augmentation in Time Series [0.0]
This work focuses on studying and analyzing the use of different techniques for data augmentation in time series for classification and regression problems.
The proposed approach involves the use of diffusion probabilistic models, which have recently achieved successful results in the field of Image Processing.
The results highlight the high utility of this methodology in creating synthetic data to train classification and regression models.
arXiv Detail & Related papers (2023-12-09T11:37:07Z) - On Inductive Biases for Machine Learning in Data Constrained Settings [0.0]
This thesis explores a different answer to the problem of learning expressive models in data constrained settings.
Instead of relying on big datasets to learn neural networks, we will replace some modules by known functions reflecting the structure of the data.
Our approach falls under the hood of "inductive biases", which can be defined as hypothesis on the data at hand restricting the space of models to explore.
arXiv Detail & Related papers (2023-02-21T14:22:01Z) - Detection of Interacting Variables for Generalized Linear Models via
Neural Networks [0.0]
We present an approach to automating the process of finding interactions that should be added to generalized linear models (GLMs)
Our approach relies on neural networks and a model-specific interaction detection method, which is computationally faster than the traditionally used methods like Friedman H-Statistic or SHAP values.
In numerical studies, we provide the results of our approach on artificially generated data as well as open-source data.
arXiv Detail & Related papers (2022-09-16T16:16:45Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Deep Learning and Handheld Augmented Reality Based System for Optimal
Data Collection in Fault Diagnostics Domain [0.0]
This paper presents a novel human-machine interaction framework to perform fault diagnostics with minimal data.
Minimizing the required data will increase the practicability of data-driven models in diagnosing faults.
The proposed framework has provided above 100% precision and recall on a novel dataset with only one instance of each fault condition.
arXiv Detail & Related papers (2022-06-15T19:15:26Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Visual Neural Decomposition to Explain Multivariate Data Sets [13.117139248511783]
Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers.
We propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables.
arXiv Detail & Related papers (2020-09-11T15:53:37Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.