Normalized mutual information is a biased measure for classification and community detection
- URL: http://arxiv.org/abs/2307.01282v2
- Date: Thu, 29 Aug 2024 17:13:20 GMT
- Title: Normalized mutual information is a biased measure for classification and community detection
- Authors: Maximilian Jerdee, Alec Kirkley, M. E. J. Newman,
- Abstract summary: We argue that results returned by the normalized mutual information are biased for two reasons.
We show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.
- Score: 0.4779196219827508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Normalized mutual information is widely used as a similarity measure for evaluating the performance of clustering and classification algorithms. In this paper, we argue that results returned by the normalized mutual information are biased for two reasons: first, because they ignore the information content of the contingency table and, second, because their symmetric normalization introduces spurious dependence on algorithm output. We introduce a modified version of the mutual information that remedies both of these shortcomings. As a practical demonstration of the importance of using an unbiased measure, we perform extensive numerical tests on a basket of popular algorithms for network community detection and show that one's conclusions about which algorithm is best are significantly affected by the biases in the traditional mutual information.
Related papers
- Mutual information and the encoding of contingency tables [0.4779196219827508]
Mutual information is commonly used as a measure of similarity between labelings of a given set of objects.
As argued recently, the mutual information as conventionally defined can return biased results because it neglects the information cost of the so-called contingency table.
In this paper we describe an improved method for encoding contingency tables that gives a substantially better bound in typical use cases.
arXiv Detail & Related papers (2024-05-08T19:49:39Z) - Correcting Underrepresentation and Intersectional Bias for Classification [49.1574468325115]
We consider the problem of learning from data corrupted by underrepresentation bias.
We show that with a small amount of unbiased data, we can efficiently estimate the group-wise drop-out rates.
We show that our algorithm permits efficient learning for model classes of finite VC dimension.
arXiv Detail & Related papers (2023-06-19T18:25:44Z) - Beyond Normal: On the Evaluation of Mutual Information Estimators [52.85079110699378]
We show how to construct a diverse family of distributions with known ground-truth mutual information.
We provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered.
arXiv Detail & Related papers (2023-06-19T17:26:34Z) - Gacs-Korner Common Information Variational Autoencoder [102.89011295243334]
We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables.
We demonstrate that our formulation allows us to learn semantically meaningful common and unique factors of variation even on high-dimensional data such as images and videos.
arXiv Detail & Related papers (2022-05-24T17:47:26Z) - Information-Theoretic Bias Reduction via Causal View of Spurious
Correlation [71.9123886505321]
We propose an information-theoretic bias measurement technique through a causal interpretation of spurious correlation.
We present a novel debiasing framework against the algorithmic bias, which incorporates a bias regularization loss.
The proposed bias measurement and debiasing approaches are validated in diverse realistic scenarios.
arXiv Detail & Related papers (2022-01-10T01:19:31Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - One-vs.-One Mitigation of Intersectional Bias: A General Method to
Extend Fairness-Aware Binary Classification [0.48733623015338234]
One-vs.-One Mitigation is a process of comparison between each pair of subgroups related to sensitive attributes to the fairness-aware machine learning for binary classification.
Our method mitigates the intersectional bias much better than conventional methods in all the settings.
arXiv Detail & Related papers (2020-10-26T11:35:39Z) - Learning Unbiased Representations via Mutual Information Backpropagation [36.383338079229695]
In particular, we face the case where some attributes (bias) of the data, if learned by the model, can severely compromise its generalization properties.
We propose a novel end-to-end optimization strategy, which simultaneously estimates and minimizes the mutual information between the learned representation and the data attributes.
arXiv Detail & Related papers (2020-03-13T18:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.