Normalization in Proportional Feature Spaces
- URL: http://arxiv.org/abs/2409.11389v1
- Date: Tue, 17 Sep 2024 17:46:27 GMT
- Title: Normalization in Proportional Feature Spaces
- Authors: Alexandre Benatti, Luciano da F. Costa,
- Abstract summary: normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling.
The selection of an appropriate normalization method needs to take into account the type and characteristics of the involved features.
- Score: 49.48516314472825
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The subject of features normalization plays an important central role in data representation, characterization, visualization, analysis, comparison, classification, and modeling, as it can substantially influence and be influenced by all of these activities and respective aspects. The selection of an appropriate normalization method needs to take into account the type and characteristics of the involved features, the methods to be used subsequently for the just mentioned data processing, as well as the specific questions being considered. After briefly considering how normalization constitutes one of the many interrelated parts typically involved in data analysis and modeling, the present work addressed the important issue of feature normalization from the perspective of uniform and proportional (right skewed) features and comparison operations. More general right skewed features are also considered in an approximated manner. Several concepts, properties, and results are described and discussed, including the description of a duality relationship between uniform and proportional feature spaces and respective comparisons, specifying conditions for consistency between comparisons in each of the two domains. Two normalization possibilities based on non-centralized dispersion of features are also presented, and also described is a modified version of the Jaccard similarity index which incorporates intrinsically normalization. Preliminary experiments are presented in order to illustrate the developed concepts and methods.
Related papers
- Relative Representations: Topological and Geometric Perspectives [53.88896255693922]
Relative representations are an established approach to zero-shot model stitching.
We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.
Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z) - Supervised Pattern Recognition Involving Skewed Feature Densities [49.48516314472825]
The classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared.
The accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account.
arXiv Detail & Related papers (2024-09-02T12:45:18Z) - Relational Local Explanations [11.679389861042]
We develop a novel model-agnostic and permutation-based feature attribution algorithm based on relational analysis between input variables.
We are able to gain a broader insight into machine learning model decisions and data.
arXiv Detail & Related papers (2022-12-23T14:46:23Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Interaction Models and Generalized Score Matching for Compositional Data [9.797319790710713]
We propose a class of exponential family models that accommodate general patterns of pairwise interaction while being supported on the probability simplex.
Special cases include the family of Dirichlet distributions as well as Aitchison's additive logistic normal distributions.
A high-dimensional analysis of our estimation methods shows that the simplex domain is handled as efficiently as previously studied full-dimensional domains.
arXiv Detail & Related papers (2021-09-10T05:29:41Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - The role of feature space in atomistic learning [62.997667081978825]
Physically-inspired descriptors play a key role in the application of machine-learning techniques to atomistic simulations.
We introduce a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels.
We compare representations built in terms of n-body correlations of the atom density, quantitatively assessing the information loss associated with the use of low-order features.
arXiv Detail & Related papers (2020-09-06T14:12:09Z) - TCMI: a non-parametric mutual-dependence estimator for multivariate
continuous distributions [0.0]
Total cumulative mutual information (TCMI) is a measure of the relevance of mutual dependences.
TCMI is a non-parametric, robust, and deterministic measure that facilitates comparisons and rankings between feature sets.
arXiv Detail & Related papers (2020-01-30T08:42:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.