The Change that Matters in Discourse Parsing: Estimating the Impact of
Domain Shift on Parser Error
- URL: http://arxiv.org/abs/2203.11317v1
- Date: Mon, 21 Mar 2022 20:04:23 GMT
- Title: The Change that Matters in Discourse Parsing: Estimating the Impact of
Domain Shift on Parser Error
- Authors: Katherine Atwell, Anthony Sicilia, Seong Jae Hwang, Malihe Alikhani
- Abstract summary: We use a statistic from the theoretical domain adaptation literature which can be directly tied to error-gap.
We study the bias of this statistic as an estimator of error-gap both theoretically and through a large-scale empirical study of over 2400 experiments on 6 discourse datasets.
- Score: 14.566990078034241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Discourse analysis allows us to attain inferences of a text document that
extend beyond the sentence-level. The current performance of discourse models
is very low on texts outside of the training distribution's coverage,
diminishing the practical utility of existing models. There is need for a
measure that can inform us to what extent our model generalizes from the
training to the test sample when these samples may be drawn from distinct
distributions. While this can be estimated via distribution shift, we argue
that this does not directly correlate with change in the observed error of a
classifier (i.e. error-gap). Thus, we propose to use a statistic from the
theoretical domain adaptation literature which can be directly tied to
error-gap. We study the bias of this statistic as an estimator of error-gap
both theoretically and through a large-scale empirical study of over 2400
experiments on 6 discourse datasets from domains including, but not limited to:
news, biomedical texts, TED talks, Reddit posts, and fiction. Our results not
only motivate our proposal and help us to understand its limitations, but also
provide insight on the properties of discourse models and datasets which
improve performance in domain adaptation. For instance, we find that non-news
datasets are slightly easier to transfer to than news datasets when the
training and test sets are very different. Our code and an associated Python
package are available to allow practitioners to make more informed model and
dataset choices.
Related papers
- Fairness Hub Technical Briefs: Definition and Detection of Distribution Shift [0.5825410941577593]
Distribution shift is a common situation in machine learning tasks, where the data used for training a model is different from the data the model is applied to in the real world.
This brief focuses on the definition and detection of distribution shifts in educational settings.
arXiv Detail & Related papers (2024-05-23T05:29:36Z) - The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes [30.30769701138665]
We introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data.
Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem.
We introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point.
arXiv Detail & Related papers (2024-02-14T03:43:05Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Explanation Shift: How Did the Distribution Shift Impact the Model? [23.403838118256907]
We study how explanation characteristics shift when affected by distribution shifts.
We analyze different types of distribution shifts using synthetic examples and real-world data sets.
We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
arXiv Detail & Related papers (2023-03-14T17:13:01Z) - Towards Fine-Grained Information: Identifying the Type and Location of
Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type.
We build an FG-TED model to predict the textbf addition and textbfomission errors.
Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z) - Detect Hate Speech in Unseen Domains using Multi-Task Learning: A Case
Study of Political Public Figures [7.52579126252489]
We propose a new Multi-task Learning pipeline that utilizes MTL to train simultaneously across multiple hate speech datasets.
We show strong results when examining generalization error in train-test splits and substantial improvements when predicting on previously unseen datasets.
We also assemble a novel dataset, dubbed PubFigs, focusing on the problematic speech of American Public Political Figures.
arXiv Detail & Related papers (2022-08-22T21:13:38Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Learning Neural Models for Natural Language Processing in the Face of
Distributional Shift [10.990447273771592]
The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications.
It builds upon the assumption that the data distribution is stationary, ie. that the data is sampled from a fixed distribution both at training and test time.
This way of training is inconsistent with how we as humans are able to learn from and operate within a constantly changing stream of information.
It is ill-adapted to real-world use cases where the data distribution is expected to shift over the course of a model's lifetime
arXiv Detail & Related papers (2021-09-03T14:29:20Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.