Feature Subset Weighting for Distance-based Supervised Learning through Choquet Integration
- URL: http://arxiv.org/abs/2504.00624v1
- Date: Tue, 01 Apr 2025 10:23:01 GMT
- Title: Feature Subset Weighting for Distance-based Supervised Learning through Choquet Integration
- Authors: Adnan Theerens, Yvan Saeys, Chris Cornelis,
- Abstract summary: This paper introduces feature subset weighting using monotone measures for distance-based supervised learning.<n>The Choquet integral is used to define a distance metric that incorporates these weights.<n>We show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features.
- Score: 2.1943338072179444
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance metric that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only $m$ feature subset weights should be calculated each time instead of calculating all feature subset weights ($2^m$), where $m$ is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a $k$-nearest neighbors (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.
Related papers
- Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making [66.27188304203217]
Temporal distances lie at the heart of many algorithms for planning, control, and reinforcement learning.<n>Prior attempts to define such temporal distances in settings have been stymied by an important limitation.<n>We show how successor features learned by contrastive learning form a temporal distance that does satisfy the triangle inequality.
arXiv Detail & Related papers (2024-06-24T19:36:45Z) - Fuzzy Rough Choquet Distances for Classification [0.6445605125467574]
This paper introduces a novel Choquet distance using fuzzy rough set based measures.
The proposed measure combines the attribute information received from fuzzy rough set theory with the flexibility of the Choquet integral.
arXiv Detail & Related papers (2024-03-18T14:53:48Z) - Computing the Distance between unbalanced Distributions -- The flat
Metric [0.0]
The flat metric generalizes the well-known Wasserstein distance W1 to the case that the distributions are of unequal total mass.
The core of the method is based on a neural network to determine on optimal test function realizing the distance between two measures.
arXiv Detail & Related papers (2023-08-02T09:30:22Z) - Robust Ellipsoid Fitting Using Axial Distance and Combination [15.39157287924673]
In random sample consensus (RANSAC), the problem of ellipsoid fitting can be formulated as a problem of minimization of point-to-model distance.
We propose a novel distance metric called the axial distance, which is converted from the algebraic distance.
A novel sample-consensus-based ellipsoid fitting method is proposed by using the combination between the axial distance and Sampson distance.
arXiv Detail & Related papers (2023-04-02T11:52:33Z) - Counting Like Human: Anthropoid Crowd Counting on Modeling the
Similarity of Objects [92.80955339180119]
mainstream crowd counting methods regress density map and integrate it to obtain counting results.
Inspired by this, we propose a rational and anthropoid crowd counting framework.
arXiv Detail & Related papers (2022-12-02T07:00:53Z) - Concrete Score Matching: Generalized Score Matching for Discrete Data [109.12439278055213]
"Concrete score" is a generalization of the (Stein) score for discrete settings.
"Concrete Score Matching" is a framework to learn such scores from samples.
arXiv Detail & Related papers (2022-11-02T00:41:37Z) - Generalized quantum similarity learning [0.0]
We propose using quantum networks (GQSim) for learning task-dependent (a)symmetric similarity between data that need not have the same dimensionality.
We demonstrate that the similarity measure derived using this technique is $(epsilon,gamma,tau)$-good, resulting in theoretically guaranteed performance.
arXiv Detail & Related papers (2022-01-07T03:28:19Z) - Tangent Space and Dimension Estimation with the Wasserstein Distance [10.118241139691952]
Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space.
We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold.
arXiv Detail & Related papers (2021-10-12T21:02:06Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - Towards Certified Robustness of Distance Metric Learning [53.96113074344632]
We advocate imposing an adversarial margin in the input space so as to improve the generalization and robustness of metric learning algorithms.
We show that the enlarged margin is beneficial to the generalization ability by using the theoretical technique of algorithmic robustness.
arXiv Detail & Related papers (2020-06-10T16:51:53Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.