A Unified View of Evaluation Metrics for Structured Prediction
- URL: http://arxiv.org/abs/2310.13793v1
- Date: Fri, 20 Oct 2023 20:02:02 GMT
- Title: A Unified View of Evaluation Metrics for Structured Prediction
- Authors: Yunmo Chen, William Gantt, Tongfei Chen, Aaron Steven White, Benjamin
Van Durme
- Abstract summary: We present a conceptual framework that unifies evaluation metrics for different structured prediction tasks.
Our framework requires representing the outputs of these tasks as objects of certain data types.
We show that new metrics can be naturally derived in a bottom-up way based on an output structure.
- Score: 41.29492827464339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a conceptual framework that unifies a variety of evaluation
metrics for different structured prediction tasks (e.g. event and relation
extraction, syntactic and semantic parsing). Our framework requires
representing the outputs of these tasks as objects of certain data types, and
derives metrics through matching of common substructures, possibly followed by
normalization. We demonstrate how commonly used metrics for a number of tasks
can be succinctly expressed by this framework, and show that new metrics can be
naturally derived in a bottom-up way based on an output structure. We release a
library that enables this derivation to create new metrics. Finally, we
consider how specific characteristics of tasks motivate metric design
decisions, and suggest possible modifications to existing metrics in line with
those motivations.
Related papers
- Towards an Improved Metric for Evaluating Disentangled Representations [0.6946415403594184]
Disentangled representation learning plays a pivotal role in making representations controllable, interpretable and transferable.
Despite its significance in the domain, the quest for reliable and consistent quantitative disentanglement metric remains a major challenge.
We propose a new framework for quantifying disentanglement, introducing a metric entitled emphEDI, that leverages the intuitive concept of emphexclusivity and improved factor-code relationship.
arXiv Detail & Related papers (2024-10-04T00:32:59Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Promptly Predicting Structures: The Return of Inference [31.442123334313035]
We present a framework for constructing zero- and few-shot linguistic structure predictors.
Our results show that enforcing consistency constructs not only structurally valid outputs, but also improves performance.
arXiv Detail & Related papers (2024-01-12T20:08:39Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals.
Model-to-Match uses variable importance measurements to construct a distance metric.
We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z) - Analyzing Text Representations under Tight Annotation Budgets: Measuring
Structural Alignment [2.198430261120653]
Under tight annotation budgets the choice of data representation is key.
We propose a metric that measures the extent to which a given representation is structurally aligned with a task.
arXiv Detail & Related papers (2022-10-11T18:28:19Z) - Benchmarking Generalization via In-Context Instructions on 1,600+
Language Tasks [95.06087720086133]
Natural-Instructions v2 is a collection of 1,600+ diverse language tasks and their expert written instructions.
The benchmark covers 70+ distinct task types, such as tagging, in-filling, and rewriting.
This benchmark enables large-scale evaluation of cross-task generalization of the models.
arXiv Detail & Related papers (2022-04-16T03:12:30Z) - A Unified Framework for Rank-based Evaluation Metrics for Link
Prediction in Knowledge Graphs [19.822126244784133]
Link prediction task on knowledge graphs without explicit negative triples motivates the usage of rank-based metrics.
We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory.
We propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.
arXiv Detail & Related papers (2022-03-14T23:09:46Z) - Leveraging Class Hierarchies with Metric-Guided Prototype Learning [5.070542698701158]
In many classification tasks, the set of target classes can be organized into a hierarchy.
This structure induces a semantic distance between classes, and can be summarised under the form of a cost matrix.
We propose to model the hierarchical class structure by integrating this metric in the supervision of a prototypical network.
arXiv Detail & Related papers (2020-07-06T20:22:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.