Estimating Agreement by Chance for Sequence Annotation
- URL: http://arxiv.org/abs/2407.11371v1
- Date: Tue, 16 Jul 2024 04:32:47 GMT
- Title: Estimating Agreement by Chance for Sequence Annotation
- Authors: Diya Li, Carolyn Rosé, Ao Yuan, Chunxiao Zhou,
- Abstract summary: We introduce a novel model for generating random annotations, which serves as the foundation for estimating chance agreement in sequence annotation tasks.
We successfully derive the analytical form of the distribution, enabling the computation of the probable location of each annotated text segment and subsequent chance agreement estimation.
- Score: 3.039887427447867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of natural language processing, correction of performance assessment for chance agreement plays a crucial role in evaluating the reliability of annotations. However, there is a notable dearth of research focusing on chance correction for assessing the reliability of sequence annotation tasks, despite their widespread prevalence in the field. To address this gap, this paper introduces a novel model for generating random annotations, which serves as the foundation for estimating chance agreement in sequence annotation tasks. Utilizing the proposed randomization model and a related comparison approach, we successfully derive the analytical form of the distribution, enabling the computation of the probable location of each annotated text segment and subsequent chance agreement estimation. Through a combination simulation and corpus-based evaluation, we successfully assess its applicability and validate its accuracy and efficacy.
Related papers
- Probabilistic Conformal Prediction with Approximate Conditional Validity [81.30551968980143]
We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution.
Our method consistently outperforms existing approaches in terms of conditional coverage.
arXiv Detail & Related papers (2024-07-01T20:44:48Z) - The Penalized Inverse Probability Measure for Conformal Classification [0.5172964916120902]
The work introduces the Penalized Inverse Probability (PIP) nonconformity score, and its regularized version RePIP, that allow the joint optimization of both efficiency and informativeness.
The work shows how PIP-based conformal classifiers exhibit precisely the desired behavior in comparison with other nonconformity measures and strike a good balance between informativeness and efficiency.
arXiv Detail & Related papers (2024-06-13T07:37:16Z) - Conformal Approach To Gaussian Process Surrogate Evaluation With
Coverage Guarantees [47.22930583160043]
We propose a method for building adaptive cross-conformal prediction intervals.
The resulting conformal prediction intervals exhibit a level of adaptivity akin to Bayesian credibility sets.
The potential applicability of the method is demonstrated in the context of surrogate modeling of an expensive-to-evaluate simulator of the clogging phenomenon in steam generators of nuclear reactors.
arXiv Detail & Related papers (2024-01-15T14:45:18Z) - Enhanced Local Explainability and Trust Scores with Random Forest Proximities [0.9423257767158634]
We exploit the fact that any random forest (RF) regression and classification model can be mathematically formulated as an adaptive weighted K nearest-neighbors model.
We show that this linearity facilitates a local notion of explainability of RF predictions that generates attributions for any model prediction across observations in the training set.
We show how this proximity-based approach to explainability can be used in conjunction with SHAP to explain not just the model predictions, but also out-of-sample performance.
arXiv Detail & Related papers (2023-10-19T02:42:20Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - Regions of Reliability in the Evaluation of Multivariate Probabilistic
Forecasts [73.33395097728128]
We provide the first systematic finite-sample study of proper scoring rules for time-series forecasting evaluation.
We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions.
arXiv Detail & Related papers (2023-04-19T17:38:42Z) - SNaC: Coherence Error Detection for Narrative Summarization [73.48220043216087]
We introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
We develop a taxonomy of coherence errors in generated narrative summaries and collect span-level annotations for 6.6k sentences across 150 book and movie screenplay summaries.
Our work provides the first characterization of coherence errors generated by state-of-the-art summarization models and a protocol for eliciting coherence judgments from crowd annotators.
arXiv Detail & Related papers (2022-05-19T16:01:47Z) - Scalable and Interpretable Marked Point Processes [5.070542698701158]
We introduce a novel inferential framework for marked point processes that enjoys both scalability and interpretability.
The framework is based on variational inference and it aims to speed up inference for a flexible family of marked point processes.
arXiv Detail & Related papers (2021-05-30T15:37:57Z) - JST-RR Model: Joint Modeling of Ratings and Reviews in Sentiment-Topic
Prediction [2.3834926671238916]
We propose a probabilistic model to accommodate both textual reviews and overall ratings.
The proposed method can enhance the prediction accuracy of review data and achieve an effective detection of interpretable topics and sentiments.
arXiv Detail & Related papers (2021-02-18T15:47:34Z) - Combining Task Predictors via Enhancing Joint Predictability [53.46348489300652]
We present a new predictor combination algorithm that improves the target by i) measuring the relevance of references based on their capabilities in predicting the target, and ii) strengthening such estimated relevance.
Our algorithm jointly assesses the relevance of all references by adopting a Bayesian framework.
Based on experiments on seven real-world datasets from visual attribute ranking and multi-class classification scenarios, we demonstrate that our algorithm offers a significant performance gain and broadens the application range of existing predictor combination approaches.
arXiv Detail & Related papers (2020-07-15T21:58:39Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.