Replicating and Extending "Because Their Treebanks Leak": Graph
Isomorphism, Covariants, and Parser Performance
- URL: http://arxiv.org/abs/2106.00352v2
- Date: Wed, 2 Jun 2021 07:18:18 GMT
- Title: Replicating and Extending "Because Their Treebanks Leak": Graph
Isomorphism, Covariants, and Parser Performance
- Authors: Mark Anderson and Anders S{\o}gaard and Carlos G\'omez Rodr\'iguez
- Abstract summary: Similar to other statistical analyses in NLP, the results were based on evaluating linear regressions.
We present a replication study in which we also bin sentences by length and find that only a small subset of sentences vary in performance with respect to graph isomorphism.
We suggest that conclusions drawn from statistical analyses like this need to be tempered and that controlled experiments can complement them by more readily teasing factors apart.
- Score: 0.32228025627337864
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: S{\o}gaard (2020) obtained results suggesting the fraction of trees occurring
in the test data isomorphic to trees in the training set accounts for a
non-trivial variation in parser performance. Similar to other statistical
analyses in NLP, the results were based on evaluating linear regressions.
However, the study had methodological issues and was undertaken using a small
sample size leading to unreliable results. We present a replication study in
which we also bin sentences by length and find that only a small subset of
sentences vary in performance with respect to graph isomorphism. Further, the
correlation observed between parser performance and graph isomorphism in the
wild disappears when controlling for covariants. However, in a controlled
experiment, where covariants are kept fixed, we do observe a strong
correlation. We suggest that conclusions drawn from statistical analyses like
this need to be tempered and that controlled experiments can complement them by
more readily teasing factors apart.
Related papers
- Multiply-Robust Causal Change Attribution [15.501106533308798]
We develop a new estimation strategy that combines regression and re-weighting methods to quantify the contribution of each causal mechanism.
Our method demonstrates excellent performance in Monte Carlo simulations, and we show its usefulness in an empirical application.
arXiv Detail & Related papers (2024-04-12T22:57:01Z) - Logistic Regression Equivalence: A Framework for Comparing Logistic
Regression Models Across Populations [4.518012967046983]
We argue that equivalence testing for a prespecified tolerance level on population differences incentivizes accuracy in the inference.
For diagnosis data, we show examples for equivalent and non-equivalent models.
arXiv Detail & Related papers (2023-03-23T15:12:52Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Energy Trees: Regression and Classification With Structured and
Mixed-Type Covariates [0.0]
Energy trees leverage energy statistics to extend the capabilities of conditional inference trees.
We show the model's competitive performance in terms of variable selection and robustness to overfitting.
We also assess the model's predictive ability through two empirical analyses involving human biological data.
arXiv Detail & Related papers (2022-07-10T10:41:51Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Prototypical Graph Contrastive Learning [141.30842113683775]
We propose a Prototypical Graph Contrastive Learning (PGCL) approach to mitigate the critical sampling bias issue.
Specifically, PGCL models the underlying semantic structure of the graph data via clustering semantically similar graphs into the same group, and simultaneously encourages the clustering consistency for different augmentations of the same graph.
For a query, PGCL further reweights its negative samples based on the distance between their prototypes (cluster centroids) and the query prototype.
arXiv Detail & Related papers (2021-06-17T16:45:31Z) - Counterfactual Invariance to Spurious Correlations: Why and How to Pass
Stress Tests [87.60900567941428]
A spurious correlation' is the dependence of a model on some aspect of the input data that an analyst thinks shouldn't matter.
In machine learning, these have a know-it-when-you-see-it character.
We study stress testing using the tools of causal inference.
arXiv Detail & Related papers (2021-05-31T14:39:38Z) - CausalVAE: Structured Causal Disentanglement in Variational Autoencoder [52.139696854386976]
The framework of variational autoencoder (VAE) is commonly used to disentangle independent factors from observations.
We propose a new VAE based framework named CausalVAE, which includes a Causal Layer to transform independent factors into causal endogenous ones.
Results show that the causal representations learned by CausalVAE are semantically interpretable, and their causal relationship as a Directed Acyclic Graph (DAG) is identified with good accuracy.
arXiv Detail & Related papers (2020-04-18T20:09:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.