Related papers: On Affine Homotopy between Language Encoders

On Affine Homotopy between Language Encoders

URL: http://arxiv.org/abs/2406.02329v2
Date: Wed, 18 Dec 2024 08:56:43 GMT
Title: On Affine Homotopy between Language Encoders
Authors: Robin SM Chan, Reda Boumasmoud, Anej Svete, Yuxin Ren, Qipeng Guo, Zhijing Jin, Shauli Ravfogel, Mrinmaya Sachan, Bernhard Schölkopf, Mennatallah El-Assady, Ryan Cotterell,
Abstract summary: We study the properties of emphaffine alignment of language encoders.<n>We find that while affine alignment is fundamentally an asymmetric notion of similarity, it is still informative of extrinsic similarity.
Score: 127.55969928213248
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pre-trained language encoders -- functions that represent text as vectors -- are an integral component of many NLP tasks. We tackle a natural question in language encoder analysis: What does it mean for two encoders to be similar? We contend that a faithful measure of similarity needs to be \emph{intrinsic}, that is, task-independent, yet still be informative of \emph{extrinsic} similarity -- the performance on downstream tasks. It is common to consider two encoders similar if they are \emph{homotopic}, i.e., if they can be aligned through some transformation. In this spirit, we study the properties of \emph{affine} alignment of language encoders and its implications on extrinsic similarity. We find that while affine alignment is fundamentally an asymmetric notion of similarity, it is still informative of extrinsic similarity. We confirm this on datasets of natural language representations. Beyond providing useful bounds on extrinsic similarity, affine intrinsic similarity also allows us to begin uncovering the structure of the space of pre-trained encoders by defining an order over them.

Related papers

QUDsim: Quantifying Discourse Similarities in LLM-Generated Text [70.22275200293964]
We introduce an abstraction based on linguistic theories in Questions Under Discussion (QUD) and question semantics to help quantify differences in discourse progression. We then use this framework to build $textbfQUDsim$, a similarity metric that can detect discursive parallels between documents. Using QUDsim, we find that LLMs often reuse discourse structures (more so than humans) across samples, even when content differs.
arXiv Detail & Related papers (2025-04-12T23:46:09Z)
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures [49.24097977047392]
We investigate two mainstream architectures for language modeling, namely Transformers and Mambas, to explore the extent of their mechanistic similarity. We propose to use Sparse Autoencoders (SAEs) to isolate interpretable features from these models and show that most features are similar in these two models.
arXiv Detail & Related papers (2024-10-09T08:28:53Z)
Do Vision and Language Encoders Represent the World Similarly? [22.70701869402434]
Aligned text-image encoders such as CLIP have become the de facto model for vision-language tasks. We find that the representation spaces of unaligned and aligned encoders are semantically similar. In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training.
arXiv Detail & Related papers (2024-01-10T15:51:39Z)
PESTS: Persian_English Cross Lingual Corpus for Semantic Textual Similarity [5.439505575097552]
Cross lingual semantic similarity models use a machine translation due to the unavailability of cross lingual semantic similarity dataset. For Persian, which is one of the low resource languages, the need for a model that can understand the context of two languages is felt more than ever. In this article, the corpus of semantic similarity between sentences in Persian and English languages has been produced for the first time by using linguistic experts.
arXiv Detail & Related papers (2023-05-13T11:02:50Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph. Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z)
Semantics-aware Attention Improves Neural Machine Translation [35.32217580058933]
We propose two novel parameter-free methods for injecting semantic information into Transformers. One such method operates on the encoder, through a Scene-Aware Self-Attention (SASA) head. Another on the decoder, through a Scene-Aware Cross-Attention (SACrA) head.
arXiv Detail & Related papers (2021-10-13T17:58:22Z)
Rethinking Positional Encoding in Language Pre-training [111.2320727291926]
We show that in absolute positional encoding, the addition operation applied on positional embeddings and word embeddings brings mixed correlations. We propose a new positional encoding method called textbfTransformer with textbfUntied textPositional textbfEncoding (T)
arXiv Detail & Related papers (2020-06-28T13:11:02Z)
Word Rotator's Distance [50.67809662270474]
Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
arXiv Detail & Related papers (2020-04-30T17:48:42Z)
Contextual Lensing of Universal Sentence Representations [4.847980206213336]
We propose Contextual Lensing, a methodology for inducing context-oriented universal sentence vectors. We show that it is possible to focus notions of language similarity into a small number of lens parameters given a core universal matrix representation.
arXiv Detail & Related papers (2020-02-20T17:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.