Towards an Improved Metric for Evaluating Disentangled Representations
- URL: http://arxiv.org/abs/2410.03056v1
- Date: Fri, 4 Oct 2024 00:32:59 GMT
- Title: Towards an Improved Metric for Evaluating Disentangled Representations
- Authors: Sahib Julka, Yashu Wang, Michael Granitzer,
- Abstract summary: Disentangled representation learning plays a pivotal role in making representations controllable, interpretable and transferable.
Despite its significance in the domain, the quest for reliable and consistent quantitative disentanglement metric remains a major challenge.
We propose a new framework for quantifying disentanglement, introducing a metric entitled emphEDI, that leverages the intuitive concept of emphexclusivity and improved factor-code relationship.
- Score: 0.6946415403594184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Disentangled representation learning plays a pivotal role in making representations controllable, interpretable and transferable. Despite its significance in the domain, the quest for reliable and consistent quantitative disentanglement metric remains a major challenge. This stems from the utilisation of diverse metrics measuring different properties and the potential bias introduced by their design. Our work undertakes a comprehensive examination of existing popular disentanglement evaluation metrics, comparing them in terms of measuring aspects of disentanglement (viz. Modularity, Compactness, and Explicitness), detecting the factor-code relationship, and describing the degree of disentanglement. We propose a new framework for quantifying disentanglement, introducing a metric entitled \emph{EDI}, that leverages the intuitive concept of \emph{exclusivity} and improved factor-code relationship to minimize ad-hoc decisions. An in-depth analysis reveals that EDI measures essential properties while offering more stability than existing metrics, advocating for its adoption as a standardised approach.
Related papers
- Building Trust in Black-box Optimization: A Comprehensive Framework for Explainability [1.3812010983144802]
Surrogate Optimization (SO) is a common resolution, yet its proprietary nature leads to a lack of explainability and transparency.
We propose emphInclusive Explainability Metrics for Surrogate Optimization (IEMSO)
These metrics enhance the transparency, trustworthiness, and explainability of the SO approaches.
arXiv Detail & Related papers (2024-10-18T16:20:17Z) - Measuring Orthogonality in Representations of Generative Models [81.13466637365553]
In unsupervised representation learning, models aim to distill essential features from high-dimensional data into lower-dimensional learned representations.
Disentanglement of independent generative processes has long been credited with producing high-quality representations.
We propose two novel metrics: Importance-Weighted Orthogonality (IWO) and Importance-Weighted Rank (IWR)
arXiv Detail & Related papers (2024-07-04T08:21:54Z) - On the stability, correctness and plausibility of visual explanation
methods based on feature importance [0.0]
We study the articulation between the stability, correctness and plausibility of explanations based on feature importance for image classifiers.
We show that the existing metrics for evaluating these properties do not always agree, raising the issue of what constitutes a good evaluation metric for explanations.
arXiv Detail & Related papers (2023-10-25T08:59:21Z) - Goodhart's Law Applies to NLP's Explanation Benchmarks [57.26445915212884]
We critically examine two sets of metrics: the ERASER metrics (comprehensiveness and sufficiency) and the EVAL-X metrics.
We show that we can inflate a model's comprehensiveness and sufficiency scores dramatically without altering its predictions or explanations on in-distribution test inputs.
Our results raise doubts about the ability of current metrics to guide explainability research, underscoring the need for a broader reassessment of what precisely these metrics are intended to capture.
arXiv Detail & Related papers (2023-08-28T03:03:03Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - QAFactEval: Improved QA-Based Factual Consistency Evaluation for
Summarization [116.56171113972944]
We show that carefully choosing the components of a QA-based metric is critical to performance.
Our solution improves upon the best-performing entailment-based metric and achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-16T00:38:35Z) - REAM$\sharp$: An Enhancement Approach to Reference-based Evaluation
Metrics for Open-domain Dialog Generation [63.46331073232526]
We present an enhancement approach to Reference-based EvAluation Metrics for open-domain dialogue systems.
A prediction model is designed to estimate the reliability of the given reference set.
We show how its predicted results can be helpful to augment the reference set, and thus improve the reliability of the metric.
arXiv Detail & Related papers (2021-05-30T10:04:13Z) - GO FIGURE: A Meta Evaluation of Factuality in Summarization [131.1087461486504]
We introduce GO FIGURE, a meta-evaluation framework for evaluating factuality evaluation metrics.
Our benchmark analysis on ten factuality metrics reveals that our framework provides a robust and efficient evaluation.
It also reveals that while QA metrics generally improve over standard metrics that measure factuality across domains, performance is highly dependent on the way in which questions are generated.
arXiv Detail & Related papers (2020-10-24T08:30:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.