Time Series of Non-Additive Metrics: Identification and Interpretation
of Contributing Factors of Variance by Linear Decomposition
- URL: http://arxiv.org/abs/2204.06688v1
- Date: Thu, 14 Apr 2022 01:15:28 GMT
- Title: Time Series of Non-Additive Metrics: Identification and Interpretation
of Contributing Factors of Variance by Linear Decomposition
- Authors: Alex Glushkovsky
- Abstract summary: The research paper addresses linear decomposition of time series of non-additive metrics.
Non-additive metrics, such as ratios, are widely used in a variety of domains.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The research paper addresses linear decomposition of time series of
non-additive metrics that allows for the identification and interpretation of
contributing factors (input features) of variance. Non-additive metrics, such
as ratios, are widely used in a variety of domains. It commonly requires
preceding aggregations of underlying variables that are used to calculate the
metric of interest. The latest poses a dimensionality challenge when the input
features and underlying variables are formed as two-dimensional arrays along
elements, such as account or customer identifications, and time points. It
rules out direct modeling of the time series of a non-additive metric as a
function of input features. The article discusses a five-step approach: (1)
segmentations of input features and the underlying variables of the metric that
are supported by unsupervised autoencoders, (2) univariate or joint fittings of
the metric by the aggregated input features on the segmented domains, (3)
transformations of pre-screened input features according to the fitted models,
(4) aggregation of the transformed features as time series, and (5) modelling
of the metric time series as a sum of constrained linear effects of the
aggregated features. Alternatively, approximation by numerical differentiation
has been considered to linearize the metric. It allows for element level
univariate or joint modeling of step (2). The process of these analytical steps
allows for a backward-looking explanatory decomposition of the metric as a sum
of time series of the survived input features. The paper includes a synthetic
example that studies loss-to-balance monthly rates of a hypothetical retail
credit portfolio. To validate that no latent factors other than the survived
input features have significant impacts on the metric, Statistical Process
Control has been introduced for the residual time series.
Related papers
- Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - Nonlinear Feature Aggregation: Two Algorithms driven by Theory [45.3190496371625]
Real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues.
We propose a dimensionality reduction algorithm (NonLinCFA) which aggregates non-linear transformations of features with a generic aggregation function.
We also test the algorithms on synthetic and real-world datasets, performing regression and classification tasks, showing competitive performances.
arXiv Detail & Related papers (2023-06-19T19:57:33Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - The Counterfactual-Shapley Value: Attributing Change in System Metrics [10.804568364995982]
A key component of an attribution question is estimating counterfactual: the (hypothetical) change in the system metric due to a specified change in a single input.
We propose a method to estimate counterfactuals using time-series predictive models and construct an attribution score, CF-Shapley.
As a real-world application, we analyze a query-ad matching system with the goal of attributing observed change in a metric for ad matching density.
arXiv Detail & Related papers (2022-08-17T16:48:20Z) - Concentration of Random Feature Matrices in High-Dimensions [7.1171757928258135]
spectra of random feature matrices provide information on the conditioning of the linear system used in random feature regression problems.
We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated.
arXiv Detail & Related papers (2022-04-14T13:01:27Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z) - Learning Conditional Invariance through Cycle Consistency [60.85059977904014]
We propose a novel approach to identify meaningful and independent factors of variation in a dataset.
Our method involves two separate latent subspaces for the target property and the remaining input information.
We demonstrate on synthetic and molecular data that our approach identifies more meaningful factors which lead to sparser and more interpretable models.
arXiv Detail & Related papers (2021-11-25T17:33:12Z) - The role of feature space in atomistic learning [62.997667081978825]
Physically-inspired descriptors play a key role in the application of machine-learning techniques to atomistic simulations.
We introduce a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels.
We compare representations built in terms of n-body correlations of the atom density, quantitatively assessing the information loss associated with the use of low-order features.
arXiv Detail & Related papers (2020-09-06T14:12:09Z) - Assignment Flows for Data Labeling on Graphs: Convergence and Stability [69.68068088508505]
This paper establishes conditions on the weight parameters that guarantee convergence of the continuous-time assignment flow to integral assignments (labelings)
Several counter-examples illustrate that violating the conditions may entail unfavorable behavior of the assignment flow regarding contextual data classification.
arXiv Detail & Related papers (2020-02-26T15:45:38Z) - TCMI: a non-parametric mutual-dependence estimator for multivariate
continuous distributions [0.0]
Total cumulative mutual information (TCMI) is a measure of the relevance of mutual dependences.
TCMI is a non-parametric, robust, and deterministic measure that facilitates comparisons and rankings between feature sets.
arXiv Detail & Related papers (2020-01-30T08:42:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.