Deriving Contextualised Semantic Features from BERT (and Other
Transformer Model) Embeddings
- URL: http://arxiv.org/abs/2012.15353v1
- Date: Wed, 30 Dec 2020 22:52:29 GMT
- Title: Deriving Contextualised Semantic Features from BERT (and Other
Transformer Model) Embeddings
- Authors: Jacob Turton, David Vinson, Robert Elliott Smith
- Abstract summary: This paper demonstrates that Binder features can be derived from the BERT embedding space.
It provides contextualised Binder embeddings, which can aid in understanding semantic differences between words in context.
It additionally provides insights into how semantic features are represented across the different layers of the BERT model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Models based on the transformer architecture, such as BERT, have marked a
crucial step forward in the field of Natural Language Processing. Importantly,
they allow the creation of word embeddings that capture important semantic
information about words in context. However, as single entities, these
embeddings are difficult to interpret and the models used to create them have
been described as opaque. Binder and colleagues proposed an intuitive embedding
space where each dimension is based on one of 65 core semantic features.
Unfortunately, the space only exists for a small dataset of 535 words, limiting
its uses. Previous work (Utsumi, 2018, 2020, Turton, Vinson & Smith, 2020) has
shown that Binder features can be derived from static embeddings and
successfully extrapolated to a large new vocabulary. Taking the next step, this
paper demonstrates that Binder features can be derived from the BERT embedding
space. This provides contextualised Binder embeddings, which can aid in
understanding semantic differences between words in context. It additionally
provides insights into how semantic features are represented across the
different layers of the BERT model.
Related papers
- Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - MarkBERT: Marking Word Boundaries Improves Chinese BERT [67.53732128091747]
MarkBERT keeps the vocabulary being Chinese characters and inserts boundary markers between contiguous words.
Compared to previous word-based BERT models, MarkBERT achieves better accuracy on text classification, keyword recognition, and semantic similarity tasks.
arXiv Detail & Related papers (2022-03-12T08:43:06Z) - Low-Resource Task-Oriented Semantic Parsing via Intrinsic Modeling [65.51280121472146]
We exploit what we intrinsically know about ontology labels to build efficient semantic parsing models.
Our model is highly efficient using a low-resource benchmark derived from TOPv2.
arXiv Detail & Related papers (2021-04-15T04:01:02Z) - SemGloVe: Semantic Co-occurrences for GloVe from BERT [55.420035541274444]
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices.
We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
arXiv Detail & Related papers (2020-12-30T15:38:26Z) - Improved Biomedical Word Embeddings in the Transformer Era [2.978663539080876]
We learn word and concept embeddings by first using the skip-gram method and further fine-tuning them with correlational information.
We conduct evaluations of these tuned static embeddings using multiple datasets for word relatedness developed by previous efforts.
arXiv Detail & Related papers (2020-12-22T03:03:50Z) - Does BERT Understand Sentiment? Leveraging Comparisons Between
Contextual and Non-Contextual Embeddings to Improve Aspect-Based Sentiment
Models [0.0]
We show that training a comparison of a contextual embedding from BERT and a generic word embedding can be used to infer sentiment.
We also show that if we finetune a subset of weights the model built on comparison of BERT and generic word embedding, it can get state of the art results for Polarity Detection in Aspect Based Sentiment Classification datasets.
arXiv Detail & Related papers (2020-11-23T19:12:31Z) - Semantic Labeling Using a Deep Contextualized Language Model [9.719972529205101]
We propose a context-aware semantic labeling method using both the column values and context.
Our new method is based on a new setting for semantic labeling, where we sequentially predict labels for an input table with missing headers.
To our knowledge, we are the first to successfully apply BERT to solve the semantic labeling task.
arXiv Detail & Related papers (2020-10-30T03:04:22Z) - CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary
Representations From Characters [14.956626084281638]
We propose a new variant of BERT that drops the wordpiece system altogether and uses a Character-CNN module instead to represent entire words by consulting their characters.
We show that this new model improves the performance of BERT on a variety of medical domain tasks while at the same time producing robust, word-level and open-vocabulary representations.
arXiv Detail & Related papers (2020-10-20T15:58:53Z) - LUKE: Deep Contextualized Entity Representations with Entity-aware
Self-attention [37.111204321059084]
We propose new pretrained contextualized representations of words and entities based on the bidirectional transformer.
Our model is trained using a new pretraining task based on the masked language model of BERT.
We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer.
arXiv Detail & Related papers (2020-10-02T15:38:03Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.