Representation biases in sentence transformers
- URL: http://arxiv.org/abs/2301.13039v1
- Date: Mon, 30 Jan 2023 16:35:23 GMT
- Title: Representation biases in sentence transformers
- Authors: Dmitry Nikolaev and Sebastian Pad\'o
- Abstract summary: We show that SOTA sentence transformers have a strong nominal-participant-set bias.
C cosine similarities between pairs of sentences are more strongly determined by the overlap in the set of their noun participants.
- Score: 1.2129015549576372
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Variants of the BERT architecture specialised for producing full-sentence
representations often achieve better performance on downstream tasks than
sentence embeddings extracted from vanilla BERT. However, there is still little
understanding of what properties of inputs determine the properties of such
representations. In this study, we construct several sets of sentences with
pre-defined lexical and syntactic structures and show that SOTA sentence
transformers have a strong nominal-participant-set bias: cosine similarities
between pairs of sentences are more strongly determined by the overlap in the
set of their noun participants than by having the same predicates, lengthy
nominal modifiers, or adjuncts. At the same time, the precise
syntactic-thematic functions of the participants are largely irrelevant.
Related papers
- Strengthening Structural Inductive Biases by Pre-training to Perform Syntactic Transformations [75.14793516745374]
We propose to strengthen the structural inductive bias of a Transformer by intermediate pre-training.
Our experiments confirm that this helps with few-shot learning of syntactic tasks such as chunking.
Our analysis shows that the intermediate pre-training leads to attention heads that keep track of which syntactic transformation needs to be applied to which token.
arXiv Detail & Related papers (2024-07-05T14:29:44Z) - Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic
Representations [102.05351905494277]
Sub-sentence encoder is a contrastively-learned contextual embedding model for fine-grained semantic representation of text.
We show that sub-sentence encoders keep the same level of inference cost and space complexity compared to sentence encoders.
arXiv Detail & Related papers (2023-11-07T20:38:30Z) - Bridging Continuous and Discrete Spaces: Interpretable Sentence
Representation Learning via Compositional Operations [80.45474362071236]
It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.
We propose InterSent, an end-to-end framework for learning interpretable sentence embeddings.
arXiv Detail & Related papers (2023-05-24T00:44:49Z) - Sentence Representations via Gaussian Embedding [15.235687410343171]
GaussCSE is a contrastive learning framework for sentence embedding.
It can handle asymmetric relationships between sentences, along with a similarity measure for identifying inclusion relations.
Our experiments show that GaussCSE achieves the same performance as previous methods in natural language inference tasks.
arXiv Detail & Related papers (2023-05-22T12:51:38Z) - Towards Structure-aware Paraphrase Identification with Phrase Alignment
Using Sentence Encoders [4.254099382808598]
We propose to combine sentence encoders with an alignment component by representing each sentence as a list of predicate-argument spans.
Empirical results show that the alignment component brings in both improved performance and interpretability for various sentence encoders.
arXiv Detail & Related papers (2022-10-11T09:52:52Z) - Interpreting BERT-based Text Similarity via Activation and Saliency Maps [26.279593839644836]
We present an unsupervised technique for explaining paragraph similarities inferred by pre-trained BERT models.
By looking at a pair of paragraphs, our technique identifies important words that dictate each paragraph's semantics, matches between the words in both paragraphs, and retrieves the most important pairs that explain the similarity between the two.
arXiv Detail & Related papers (2022-08-13T10:06:24Z) - Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on
a Syntactic Task [70.29624135819884]
We study the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates.
Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor is present.
arXiv Detail & Related papers (2022-04-14T11:33:15Z) - Syntactic Perturbations Reveal Representational Correlates of
Hierarchical Phrase Structure in Pretrained Language Models [22.43510769150502]
It is not entirely clear what aspects of sentence-level syntax are captured by vector-based language representations.
We show that Transformers build sensitivity to larger parts of the sentence along their layers, and that hierarchical phrase structure plays a role in this process.
arXiv Detail & Related papers (2021-04-15T16:30:31Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.