Finding Experts in Transformer Models
- URL: http://arxiv.org/abs/2005.07647v1
- Date: Fri, 15 May 2020 17:07:02 GMT
- Title: Finding Experts in Transformer Models
- Authors: Xavier Suau, Luca Zappella, Nicholas Apostoloff
- Abstract summary: We study the presence of expert units in pre-trained Transformer Models (TM), and how they impact a model's performance.
We compile a dataset of 1641 concepts that allows diverse expert units in TM to be discovered.
We show how to self-condition off-the-shelf pre-trained language models to generate text with a given concept by forcing the top experts to be active.
- Score: 2.105564340986074
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this work we study the presence of expert units in pre-trained Transformer
Models (TM), and how they impact a model's performance. We define expert units
to be neurons that are able to classify a concept with a given average
precision, where a concept is represented by a binary set of sentences
containing the concept (or not). Leveraging the OneSec dataset (Scarlini et
al., 2019), we compile a dataset of 1641 concepts that allows diverse expert
units in TM to be discovered. We show that expert units are important in
several ways: (1) The presence of expert units is correlated ($r^2=0.833$) with
the generalization power of TM, which allows ranking TM without requiring
fine-tuning on suites of downstream tasks. We further propose an empirical
method to decide how accurate such experts should be to evaluate
generalization. (2) The overlap of top experts between concepts provides a
sensible way to quantify concept co-learning, which can be used for
explainability of unknown concepts. (3) We show how to self-condition
off-the-shelf pre-trained language models to generate text with a given concept
by forcing the top experts to be active, without requiring re-training the
model or using additional parameters.
Related papers
- MoIN: Mixture of Introvert Experts to Upcycle an LLM [15.182215869841789]
This paper aims to improve an existing large language model without continued pre-training of the full-model.
The idea is to split the pre-training data into semantically relevant groups and train an expert on each subset.
During inference, an incoming query is first routed to the most relevant expert which is then loaded onto the base model for the forward pass.
arXiv Detail & Related papers (2024-10-13T01:11:04Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Uncovering Unique Concept Vectors through Latent Space Decomposition [0.0]
Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution estimates.
We propose a novel post-hoc unsupervised method that automatically uncovers the concepts learned by deep models during training.
Our experiments reveal that the majority of our concepts are readily understandable to humans, exhibit coherency, and bear relevance to the task at hand.
arXiv Detail & Related papers (2023-07-13T17:21:54Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - SuperCone: Modeling Heterogeneous Experts with Concept Meta-learning for
Unified Predictive Segments System [8.917697023052257]
We present SuperCone, our unified predicative segments system.
It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints.
It can outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks.
arXiv Detail & Related papers (2022-03-09T04:11:39Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - A Minimalist Dataset for Systematic Generalization of Perception,
Syntax, and Semantics [131.93113552146195]
We present a new dataset, Handwritten arithmetic with INTegers (HINT), to examine machines' capability of learning generalizable concepts.
In HINT, machines are tasked with learning how concepts are perceived from raw signals such as images.
We undertake extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3.
arXiv Detail & Related papers (2021-03-02T01:32:54Z) - Towards Unbiased and Accurate Deferral to Multiple Experts [19.24068936057053]
We propose a framework that simultaneously learns a classifier and a deferral system, with the deferral system choosing to defer to one or more human experts.
We test our framework on a synthetic dataset and a content moderation dataset with biased synthetic experts, and show that it significantly improves the accuracy and fairness of the final predictions.
arXiv Detail & Related papers (2021-02-25T17:08:39Z) - Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
Injection into Pretrained Transformers [54.417299589288184]
We investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus.
Our adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS.
arXiv Detail & Related papers (2020-05-24T15:49:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.