Correcting Flaws in Common Disentanglement Metrics
- URL: http://arxiv.org/abs/2304.02335v1
- Date: Wed, 5 Apr 2023 09:43:58 GMT
- Title: Correcting Flaws in Common Disentanglement Metrics
- Authors: Louis Mahon, Lei Shah, Thomas Lukasiewicz
- Abstract summary: In this paper, we identify two failings of existing metrics, which mean they can assign a high score to a model which is still entangled.
We then consider the task of compositional generalization.
Unlike prior works, we treat this as a classification problem, which allows us to use it to measure the disentanglement ability of the encoder.
We show that performance on this task is (a) generally quite poor, (b) correlated with most disentanglement metrics, and (c) most strongly correlated with our newly proposed metrics.
- Score: 44.937838134027714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have seen growing interest in learning disentangled
representations, in which distinct features, such as size or shape, are
represented by distinct neurons. Quantifying the extent to which a given
representation is disentangled is not straightforward; multiple metrics have
been proposed. In this paper, we identify two failings of existing metrics,
which mean they can assign a high score to a model which is still entangled,
and we propose two new metrics, which redress these problems. We then consider
the task of compositional generalization. Unlike prior works, we treat this as
a classification problem, which allows us to use it to measure the
disentanglement ability of the encoder, without depending on the decoder. We
show that performance on this task is (a) generally quite poor, (b) correlated
with most disentanglement metrics, and (c) most strongly correlated with our
newly proposed metrics.
Related papers
- $F_β$-plot -- a visual tool for evaluating imbalanced data classifiers [0.0]
The paper proposes a simple approach to analyzing the popular parametric metric $F_beta$.
It is possible to indicate for a given pool of analyzed classifiers when a given model should be preferred depending on user requirements.
arXiv Detail & Related papers (2024-04-11T18:07:57Z) - Cobra Effect in Reference-Free Image Captioning Metrics [58.438648377314436]
A proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged.
In this paper, we study if there are any deficiencies in reference-free metrics.
We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2024-02-18T12:36:23Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Goodness of Fit Metrics for Multi-class Predictor [0.0]
Several metrics are commonly used to measure fit goodness.
A leading constraint at least in emphreal world multi-class problems is imbalanced data.
We suggest generalizing Matthew's correlation coefficient into multi-dimensions.
arXiv Detail & Related papers (2022-08-11T06:07:29Z) - On Quantitative Evaluations of Counterfactuals [88.42660013773647]
This paper consolidates work on evaluating visual counterfactual examples through an analysis and experiments.
We find that while most metrics behave as intended for sufficiently simple datasets, some fail to tell the difference between good and bad counterfactuals when the complexity increases.
We propose two new metrics, the Label Variation Score and the Oracle score, which are both less vulnerable to such tiny changes.
arXiv Detail & Related papers (2021-10-30T05:00:36Z) - Measure Twice, Cut Once: Quantifying Bias and Fairness in Deep Neural
Networks [7.763173131630868]
We propose two metrics to quantitatively evaluate the class-wise bias of two models in comparison to one another.
By evaluating the performance of these new metrics and by demonstrating their practical application, we show that they can be used to measure fairness as well as bias.
arXiv Detail & Related papers (2021-10-08T22:35:34Z) - Dimension Free Generalization Bounds for Non Linear Metric Learning [61.193693608166114]
We provide uniform generalization bounds for two regimes -- the sparse regime, and a non-sparse regime.
We show that by relying on a different, new property of the solutions, it is still possible to provide dimension free generalization guarantees.
arXiv Detail & Related papers (2021-02-07T14:47:00Z) - A Novel Random Forest Dissimilarity Measure for Multi-View Learning [8.185807285320553]
Two methods are proposed, which modify the Random Forest proximity measure, to adapt it to the context of High Dimension Low Sample Size (HDLSS) multi-view classification problems.
The second method, based on an Instance Hardness measurement, is significantly more accurate than other state-of-the-art measurements.
arXiv Detail & Related papers (2020-07-06T07:54:52Z) - Project and Forget: Solving Large-Scale Metric Constrained Problems [7.381113319198104]
Given a set of dissimilarity measurements amongst data points, determining what metric representation is most "consistent" with the input measurements is a key step in many machine learning algorithms.
Existing methods are restricted to specific kinds of metrics or small problem sizes because of the large number of metric constraints in such problems.
In this paper, we provide an active set, Project and Forget, that uses Bregman projections, to solve metric constrained problems with many (possibly exponentially) inequality constraints.
arXiv Detail & Related papers (2020-05-08T04:50:54Z) - KPQA: A Metric for Generative Question Answering Using Keyphrase Weights [64.54593491919248]
KPQA-metric is a new metric for evaluating correctness of generative question answering systems.
Our new metric assigns different weights to each token via keyphrase prediction.
We show that our proposed metric has a significantly higher correlation with human judgments than existing metrics.
arXiv Detail & Related papers (2020-05-01T03:24:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.