The King is Naked: on the Notion of Robustness for Natural Language
Processing
- URL: http://arxiv.org/abs/2112.07605v1
- Date: Mon, 13 Dec 2021 16:19:48 GMT
- Title: The King is Naked: on the Notion of Robustness for Natural Language
Processing
- Authors: Emanuele La Malfa and Marta Kwiatkowska
- Abstract summary: We argue for semantic robustness, which is better aligned with the human concept of linguistic fidelity.
We study semantic robustness of a range of vanilla and robustly trained architectures using a template-based generative test bed.
- Score: 18.973116252065278
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is growing evidence that the classical notion of adversarial robustness
originally introduced for images has been adopted as a de facto standard by a
large part of the NLP research community. We show that this notion is
problematic in the context of NLP as it considers a narrow spectrum of
linguistic phenomena. In this paper, we argue for semantic robustness, which is
better aligned with the human concept of linguistic fidelity. We characterize
semantic robustness in terms of biases that it is expected to induce in a
model. We study semantic robustness of a range of vanilla and robustly trained
architectures using a template-based generative test bed. We complement the
analysis with empirical evidence that, despite being harder to implement,
semantic robustness can improve performance %gives guarantees for on complex
linguistic phenomena where models robust in the classical sense fail.
Related papers
- Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models [16.894375498353092]
Sparse Autoencoders (SAEs) have emerged as a powerful framework for machine learning interpretability.
Existing SAEs exhibit severe instability, as identical models trained on similar datasets can produce sharply different dictionaries.
We present Archetypal SAEs, wherein dictionary atoms are constrained to the convex hull of data.
arXiv Detail & Related papers (2025-02-18T14:29:11Z) - Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness [68.69369585600698]
Deep learning models often suffer from a lack of interpretability due to polysemanticity.
Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability.
We show that monosemantic features not only enhance interpretability but also bring concrete gains in model performance.
arXiv Detail & Related papers (2024-10-27T18:03:20Z) - Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning [84.94709351266557]
We focus on the trustworthiness of language models with respect to retrieval augmentation.
We deem that retrieval-augmented language models have the inherent capabilities of supplying response according to both contextual and parametric knowledge.
Inspired by aligning language models with human preference, we take the first step towards aligning retrieval-augmented language models to a status where it responds relying merely on the external evidence.
arXiv Detail & Related papers (2024-10-22T09:25:21Z) - Enhancing adversarial robustness in Natural Language Inference using explanations [41.46494686136601]
We cast the spotlight on the underexplored task of Natural Language Inference (NLI)
We validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation.
We research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.
arXiv Detail & Related papers (2024-09-11T17:09:49Z) - Explaining Language Models' Predictions with High-Impact Concepts [11.47612457613113]
We propose a complete framework for extending concept-based interpretability methods to NLP.
We optimize for features whose existence causes the output predictions to change substantially.
Our method achieves superior results on predictive impact, usability, and faithfulness compared to the baselines.
arXiv Detail & Related papers (2023-05-03T14:48:27Z) - On Robustness of Prompt-based Semantic Parsing with Large Pre-trained
Language Model: An Empirical Study on Codex [48.588772371355816]
This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, codex.
Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples.
arXiv Detail & Related papers (2023-01-30T13:21:00Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Towards Robustness Against Natural Language Word Substitutions [87.56898475512703]
Robustness against word substitutions has a well-defined and widely acceptable form, using semantically similar words as substitutions.
Previous defense methods capture word substitutions in vector space by using either $l$-ball or hyper-rectangle.
arXiv Detail & Related papers (2021-07-28T17:55:08Z) - Multi-sense embeddings through a word sense disambiguation process [2.2344764434954256]
Most Suitable Sense.
(MSSA) disambiguates and annotates each word by its specific sense, considering the semantic effects of its context.
We test our approach on six different benchmarks for the word similarity task, showing that our approach can produce state-of-the-art results.
arXiv Detail & Related papers (2021-01-21T16:22:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.