Related papers: Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective

Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective

URL: http://arxiv.org/abs/2204.12316v1
Date: Tue, 26 Apr 2022 13:50:07 GMT
Title: Systematicity, Compositionality and Transitivity of Deep NLP Models: a Metamorphic Testing Perspective
Authors: Edoardo Manino, Julia Rozanova, Danilo Carvalho, Andre Freitas, Lucas Cordeiro
Abstract summary: We propose three new classes of metamorphic relations, which address the properties of systematicity, compositionality and transitivity. With them, we test the internal consistency of state-of-the-art NLP models, and show that they do not always behave according to their expected linguistic properties. Lastly, we introduce a novel graphical notation that efficiently summarises the inner structure of metamorphic relations.
Score: 1.6799377888527685
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Metamorphic testing has recently been used to check the safety of neural NLP models. Its main advantage is that it does not rely on a ground truth to generate test cases. However, existing studies are mostly concerned with robustness-like metamorphic relations, limiting the scope of linguistic properties they can test. We propose three new classes of metamorphic relations, which address the properties of systematicity, compositionality and transitivity. Unlike robustness, our relations are defined over multiple source inputs, thus increasing the number of test cases that we can produce by a polynomial factor. With them, we test the internal consistency of state-of-the-art NLP models, and show that they do not always behave according to their expected linguistic properties. Lastly, we introduce a novel graphical notation that efficiently summarises the inner structure of metamorphic relations.

Related papers

Explaining Datasets in Words: Statistical Models with Natural Language Parameters [66.69456696878842]
We introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates. We apply our framework to a wide range of problems: taxonomizing user chat dialogues, characterizing how they evolve across time, finding categories where one language model is better than the other.
arXiv Detail & Related papers (2024-09-13T01:40:20Z)
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space. Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z)
Benchmarking Compositionality with Formal Languages [64.09083307778951]
We investigate whether large neural models in NLP can acquire the ability tocombining primitive concepts into larger novel combinations while learning from data. By randomly sampling over many transducers, we explore which of their properties contribute to learnability of a compositional relation by a neural network. We find that the models either learn the relations completely or not at all. The key is transition coverage, setting a soft learnability limit at 400 examples per transition.
arXiv Detail & Related papers (2022-08-17T10:03:18Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters [35.103851212995046]
Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs. We explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on. We develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset.
arXiv Detail & Related papers (2022-04-15T12:56:39Z)
Artificial Text Detection via Examining the Topology of Attention Maps [58.46367297712477]
We propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10% on three common datasets. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties.
arXiv Detail & Related papers (2021-09-10T12:13:45Z)
Replicating and Extending "Because Their Treebanks Leak": Graph Isomorphism, Covariants, and Parser Performance [0.32228025627337864]
Similar to other statistical analyses in NLP, the results were based on evaluating linear regressions. We present a replication study in which we also bin sentences by length and find that only a small subset of sentences vary in performance with respect to graph isomorphism. We suggest that conclusions drawn from statistical analyses like this need to be tempered and that controlled experiments can complement them by more readily teasing factors apart.
arXiv Detail & Related papers (2021-06-01T10:00:46Z)
Exploring Transitivity in Neural NLI Models through Veridicality [39.845425535943534]
We focus on the transitivity of inference relations, a fundamental property for systematically drawing inferences. A model capturing transitivity can compose basic inference patterns and draw new inferences. We find that current NLI models do not perform consistently well on transitivity inference tasks.
arXiv Detail & Related papers (2021-01-26T11:18:35Z)
ABNIRML: Analyzing the Behavior of Neural IR Models [45.74073795558624]
Pretrained language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML) We conduct an empirical study that yields insights into the factors that contribute to the neural model's gains.
arXiv Detail & Related papers (2020-11-02T03:07:38Z)
High-order Semantic Role Labeling [86.29371274587146]
This paper introduces a high-order graph structure for the neural semantic role labeling model. It enables the model to explicitly consider not only the isolated predicate-argument pairs but also the interaction between the predicate-argument pairs. Experimental results on 7 languages of the CoNLL-2009 benchmark show that the high-order structural learning techniques are beneficial to the strong performing SRL models.
arXiv Detail & Related papers (2020-10-09T15:33:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.