A Unified View of Evaluation Metrics for Structured Prediction
- URL: http://arxiv.org/abs/2310.13793v1
- Date: Fri, 20 Oct 2023 20:02:02 GMT
- Title: A Unified View of Evaluation Metrics for Structured Prediction
- Authors: Yunmo Chen, William Gantt, Tongfei Chen, Aaron Steven White, Benjamin
Van Durme
- Abstract summary: We present a conceptual framework that unifies evaluation metrics for different structured prediction tasks.
Our framework requires representing the outputs of these tasks as objects of certain data types.
We show that new metrics can be naturally derived in a bottom-up way based on an output structure.
- Score: 41.29492827464339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a conceptual framework that unifies a variety of evaluation
metrics for different structured prediction tasks (e.g. event and relation
extraction, syntactic and semantic parsing). Our framework requires
representing the outputs of these tasks as objects of certain data types, and
derives metrics through matching of common substructures, possibly followed by
normalization. We demonstrate how commonly used metrics for a number of tasks
can be succinctly expressed by this framework, and show that new metrics can be
naturally derived in a bottom-up way based on an output structure. We release a
library that enables this derivation to create new metrics. Finally, we
consider how specific characteristics of tasks motivate metric design
decisions, and suggest possible modifications to existing metrics in line with
those motivations.
Related papers
- Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics that can provide more insights.
We introduce a new model that harnesses the power of Language Models (LMs) for enhanced effectiveness and efficiency.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Promptly Predicting Structures: The Return of Inference [31.442123334313035]
We present a framework for constructing zero- and few-shot linguistic structure predictors.
Our results show that enforcing consistency constructs not only structurally valid outputs, but also improves performance.
arXiv Detail & Related papers (2024-01-12T20:08:39Z) - MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals.
Model-to-Match uses variable importance measurements to construct a distance metric.
We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z) - Analyzing Text Representations under Tight Annotation Budgets: Measuring
Structural Alignment [2.198430261120653]
Under tight annotation budgets the choice of data representation is key.
We propose a metric that measures the extent to which a given representation is structurally aligned with a task.
arXiv Detail & Related papers (2022-10-11T18:28:19Z) - The Counterfactual-Shapley Value: Attributing Change in System Metrics [10.804568364995982]
A key component of an attribution question is estimating counterfactual: the (hypothetical) change in the system metric due to a specified change in a single input.
We propose a method to estimate counterfactuals using time-series predictive models and construct an attribution score, CF-Shapley.
As a real-world application, we analyze a query-ad matching system with the goal of attributing observed change in a metric for ad matching density.
arXiv Detail & Related papers (2022-08-17T16:48:20Z) - Benchmarking Generalization via In-Context Instructions on 1,600+
Language Tasks [95.06087720086133]
Natural-Instructions v2 is a collection of 1,600+ diverse language tasks and their expert written instructions.
The benchmark covers 70+ distinct task types, such as tagging, in-filling, and rewriting.
This benchmark enables large-scale evaluation of cross-task generalization of the models.
arXiv Detail & Related papers (2022-04-16T03:12:30Z) - A Unified Framework for Rank-based Evaluation Metrics for Link
Prediction in Knowledge Graphs [19.822126244784133]
Link prediction task on knowledge graphs without explicit negative triples motivates the usage of rank-based metrics.
We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory.
We propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.
arXiv Detail & Related papers (2022-03-14T23:09:46Z) - Leveraging Class Hierarchies with Metric-Guided Prototype Learning [5.070542698701158]
In many classification tasks, the set of target classes can be organized into a hierarchy.
This structure induces a semantic distance between classes, and can be summarised under the form of a cost matrix.
We propose to model the hierarchical class structure by integrating this metric in the supervision of a prototypical network.
arXiv Detail & Related papers (2020-07-06T20:22:08Z) - Evaluation Metrics for Conditional Image Generation [100.69766435176557]
We present two new metrics for evaluating generative models in the class-conditional image generation setting.
A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts.
We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models.
arXiv Detail & Related papers (2020-04-26T12:15:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.