Related papers: Assessing Phrasal Representation and Composition in Transformers

Assessing Phrasal Representation and Composition in Transformers

URL: http://arxiv.org/abs/2010.03763v2
Date: Wed, 14 Oct 2020 02:25:52 GMT
Title: Assessing Phrasal Representation and Composition in Transformers
Authors: Lang Yu and Allyson Ettinger
Abstract summary: Deep transformer models have pushed performance on NLP tasks to new limits. We present systematic analysis of phrasal representations in state-of-the-art pre-trained transformers. We find that phrase representation in these models relies heavily on word content, with little evidence of nuanced composition.
Score: 13.460125148455143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep transformer models have pushed performance on NLP tasks to new limits, suggesting sophisticated treatment of complex linguistic inputs, such as phrases. However, we have limited understanding of how these models handle representation of phrases, and whether this reflects sophisticated composition of phrase meaning like that done by humans. In this paper, we present systematic analysis of phrasal representations in state-of-the-art pre-trained transformers. We use tests leveraging human judgments of phrase similarity and meaning shift, and compare results before and after control of word overlap, to tease apart lexical effects versus composition effects. We find that phrase representation in these models relies heavily on word content, with little evidence of nuanced composition. We also identify variations in phrase representation quality across models, layers, and representation types, and make corresponding recommendations for usage of representations from these models.

Related papers

Comateformer: Combined Attention Transformer for Semantic Sentence Matching [11.746010399185437]
We propose a novel semantic sentence matching model named Combined Attention Network based on Transformer model (Comateformer) In Comateformer model, we design a novel transformer-based quasi-attention mechanism with compositional properties. Our proposed approach builds on the intuition of similarity and dissimilarity (negative affinity) when calculating dual affinity scores.
arXiv Detail & Related papers (2024-12-10T06:18:07Z)
Intervention Lens: from Representation Surgery to String Counterfactuals [106.98481791980367]
Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior. We give a method to convert representation counterfactuals into string counterfactuals. The resulting counterfactuals can be used to mitigate bias in classification through data augmentation.
arXiv Detail & Related papers (2024-02-17T18:12:02Z)
Semantics of Multiword Expressions in Transformer-Based Models: A Survey [8.372465442144048]
Multiword expressions (MWEs) are composed of multiple words and exhibit variable degrees of compositionality. We provide the first in-depth survey of MWE processing with transformer models. We find that they capture MWE semantics inconsistently, as shown by reliance on surface patterns and memorized information.
arXiv Detail & Related papers (2024-01-27T11:51:11Z)
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention. Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z)
Word-Level Explanations for Analyzing Bias in Text-to-Image Models [72.71184730702086]
Text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex. This paper investigates which word in the input prompt is responsible for bias in generated images.
arXiv Detail & Related papers (2023-06-03T21:39:07Z)
Explaining How Transformers Use Context to Build Predictions [0.1749935196721634]
Language Generation Models produce words based on the previous context. It is still unclear how prior words affect the model's decision throughout the layers. We leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation.
arXiv Detail & Related papers (2023-05-21T18:29:10Z)
Are Representations Built from the Ground Up? An Empirical Examination of Local Composition in Language Models [91.3755431537592]
Representing compositional and non-compositional phrases is critical for language understanding. We first formulate a problem of predicting the LM-internal representations of longer phrases given those of their constituents. While we would expect the predictive accuracy to correlate with human judgments of semantic compositionality, we find this is largely not the case.
arXiv Detail & Related papers (2022-10-07T14:21:30Z)
On the Interplay Between Fine-tuning and Composition in Transformers [7.513100214864645]
We investigate the impact of fine-tuning on the capacity of contextualized embeddings to capture phrase meaning information. Specifically, we fine-tune models on an adversarial paraphrase classification task with high lexical overlap, and on a sentiment classification task. We find that fine-tuning largely fails to benefit compositionality in these representations, though training on sentiment yields a small, localized benefit for certain models.
arXiv Detail & Related papers (2021-05-31T01:49:56Z)
Syntactic Perturbations Reveal Representational Correlates of Hierarchical Phrase Structure in Pretrained Language Models [22.43510769150502]
It is not entirely clear what aspects of sentence-level syntax are captured by vector-based language representations. We show that Transformers build sensitivity to larger parts of the sentence along their layers, and that hierarchical phrase structure plays a role in this process.
arXiv Detail & Related papers (2021-04-15T16:30:31Z)
Accurate Word Representations with Universal Visual Guidance [55.71425503859685]
This paper proposes a visual representation method to explicitly enhance conventional word embedding with multiple-aspect senses from visual guidance. We build a small-scale word-image dictionary from a multimodal seed dataset where each word corresponds to diverse related images. Experiments on 12 natural language understanding and machine translation tasks further verify the effectiveness and the generalization capability of the proposed approach.
arXiv Detail & Related papers (2020-12-30T09:11:50Z)
Unsupervised Distillation of Syntactic Information from Contextualized Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations. To this end, we automatically generate groups of sentences which are structurally similar but semantically different. We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z)
Analysing Lexical Semantic Change with Contextualised Word Representations [7.071298726856781]
We propose a novel method that exploits the BERT neural language model to obtain representations of word usages. We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements.
arXiv Detail & Related papers (2020-04-29T12:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.