Similarity Analysis of Contextual Word Representation Models
- URL: http://arxiv.org/abs/2005.01172v1
- Date: Sun, 3 May 2020 19:48:15 GMT
- Title: Similarity Analysis of Contextual Word Representation Models
- Authors: John M. Wu, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim
Dalvi, James Glass
- Abstract summary: We use existing and novel similarity measures to gauge the level of localization of information in the deep models.
The analysis reveals that models within the same family are more similar to one another, as may be expected.
Surprisingly, different architectures have rather similar representations, but different individual neurons.
- Score: 39.12749165544309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates contextual word representation models from the lens
of similarity analysis. Given a collection of trained models, we measure the
similarity of their internal representations and attention. Critically, these
models come from vastly different architectures. We use existing and novel
similarity measures that aim to gauge the level of localization of information
in the deep models, and facilitate the investigation of which design factors
affect model similarity, without requiring any external linguistic annotation.
The analysis reveals that models within the same family are more similar to one
another, as may be expected. Surprisingly, different architectures have rather
similar representations, but different individual neurons. We also observed
differences in information localization in lower and higher layers and found
that higher layers are more affected by fine-tuning on downstream tasks.
Related papers
- Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures [49.24097977047392]
We investigate two mainstream architectures for language modeling, namely Transformers and Mambas, to explore the extent of their mechanistic similarity.
We propose to use Sparse Autoencoders (SAEs) to isolate interpretable features from these models and show that most features are similar in these two models.
arXiv Detail & Related papers (2024-10-09T08:28:53Z) - The Scenario Refiner: Grounding subjects in images at the morphological
level [2.401993998791928]
We ask whether Vision and Language (V&L) models capture such distinctions at the morphological level.
We compare the results from V&L models to human judgements and find that models' predictions differ from those of human participants.
arXiv Detail & Related papers (2023-09-20T12:23:06Z) - Similarity of Neural Architectures using Adversarial Attack Transferability [47.66096554602005]
We design a quantitative and scalable similarity measure between neural architectures.
We conduct a large-scale analysis on 69 state-of-the-art ImageNet classifiers.
Our results provide insights into why developing diverse neural architectures with distinct components is necessary.
arXiv Detail & Related papers (2022-10-20T16:56:47Z) - Temporal Relevance Analysis for Video Action Models [70.39411261685963]
We first propose a new approach to quantify the temporal relationships between frames captured by CNN-based action models.
We then conduct comprehensive experiments and in-depth analysis to provide a better understanding of how temporal modeling is affected.
arXiv Detail & Related papers (2022-04-25T19:06:48Z) - Geometric and Topological Inference for Deep Representations of Complex
Networks [13.173307471333619]
We present a class of statistics that emphasize the topology as well as the geometry of representations.
We evaluate these statistics in terms of the sensitivity and specificity that they afford when used for model selection.
These new methods enable brain and computer scientists to visualize the dynamic representational transformations learned by brains and models.
arXiv Detail & Related papers (2022-03-10T17:14:14Z) - Contrastive Learning for Neural Topic Model [14.65513836956786]
adversarial topic models (ATM) can successfully capture semantic patterns of the document by differentiating a document with another dissimilar sample.
We propose a novel approach to re-formulate discriminative goal as an optimization problem, and design a novel sampling method.
Experimental results show that our framework outperforms other state-of-the-art neural topic models in three common benchmark datasets.
arXiv Detail & Related papers (2021-10-25T09:46:26Z) - The Grammar-Learning Trajectories of Neural Language Models [42.32479280480742]
We show that neural language models acquire linguistic phenomena in a similar order, despite having different end performances over the data.
Results suggest that NLMs exhibit consistent developmental'' stages.
arXiv Detail & Related papers (2021-09-13T16:17:23Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.