ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for
Nouns' Semantic Properties and their Prototypicality
- URL: http://arxiv.org/abs/2110.06376v1
- Date: Tue, 12 Oct 2021 21:43:37 GMT
- Title: ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for
Nouns' Semantic Properties and their Prototypicality
- Authors: Marianna Apidianaki and Aina Gar\'i Soler
- Abstract summary: We probe BERT (Devlin et al.) for the construction of English nouns as expressed by adjectives that do not restrict the reference scope.
We base our study on psycholinguistics datasets that capture the association strength between nouns and their semantic features.
We show that when tested in a fine-tuning setting addressing entailment, BERT successfully leverages the information needed for reasoning about the meaning of adjective-nouns.
- Score: 4.915907527975786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large scale language models encode rich commonsense knowledge acquired
through exposure to massive data during pre-training, but their understanding
of entities and their semantic properties is unclear. We probe BERT (Devlin et
al., 2019) for the properties of English nouns as expressed by adjectives that
do not restrict the reference scope of the noun they modify (as in "red car"),
but instead emphasise some inherent aspect ("red strawberry"). We base our
study on psycholinguistics datasets that capture the association strength
between nouns and their semantic features. We probe BERT using cloze tasks and
in a classification setting, and show that the model has marginal knowledge of
these features and their prevalence as expressed in these datasets. We discuss
factors that make evaluation challenging and impede drawing general conclusions
about the models' knowledge of noun properties. Finally, we show that when
tested in a fine-tuning setting addressing entailment, BERT successfully
leverages the information needed for reasoning about the meaning of
adjective-noun constructions outperforming previous methods.
Related papers
- Making Pre-trained Language Models Great on Tabular Prediction [50.70574370855663]
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing.
We present TP-BERTa, a specifically pre-trained LM for tabular data prediction.
A novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names.
arXiv Detail & Related papers (2024-03-04T08:38:56Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun
Property Prediction [34.37730333491428]
properties of nouns are more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts.
We propose to extract these properties from images and use them in an ensemble model, in order to complement the information that is extracted from language models.
Our results show that the proposed combination of text and images greatly improves noun property prediction compared to powerful text-based language models.
arXiv Detail & Related papers (2022-10-24T01:25:21Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Representing Affect Information in Word Embeddings [5.378735006566249]
We investigated whether and how the affect meaning of a word is encoded in word embeddings pre-trained in large neural networks.
The embeddings varied in being static or contextualized, and how much affect specific information was prioritized during the pre-training and fine-tuning phase.
arXiv Detail & Related papers (2022-09-21T18:16:33Z) - Towards Explainability in NLP: Analyzing and Calculating Word Saliency
through Word Properties [4.330880304715002]
We explore the relationships between the word saliency and the word properties.
We establish a mapping model, Seq2Saliency, from the words in a text sample and their properties to the saliency values.
The experimental evaluations are conducted to analyze the saliency of words with different properties.
arXiv Detail & Related papers (2022-07-17T06:02:48Z) - Knowledge Graph Fusion for Language Model Fine-tuning [0.0]
We investigate the benefits of knowledge incorporation into the fine-tuning stages of BERT.
An existing K-BERT model, which enriches sentences with triplets from a Knowledge Graph, is adapted for the English language.
Changes made to K-BERT for accommodating the English language also extend to other word-based languages.
arXiv Detail & Related papers (2022-06-21T08:06:22Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z) - BERT Knows Punta Cana is not just beautiful, it's gorgeous: Ranking
Scalar Adjectives with Contextualised Representations [6.167728295758172]
We propose a novel BERT-based approach to intensity detection for scalar adjectives.
We model intensity by vectors directly derived from contextualised representations and show they can successfully rank scalar adjectives.
arXiv Detail & Related papers (2020-10-06T13:05:47Z) - Syntactic Structure Distillation Pretraining For Bidirectional Encoders [49.483357228441434]
We introduce a knowledge distillation strategy for injecting syntactic biases into BERT pretraining.
We distill the approximate marginal distribution over words in context from the syntactic LM.
Our findings demonstrate the benefits of syntactic biases, even in representation learners that exploit large amounts of data.
arXiv Detail & Related papers (2020-05-27T16:44:01Z) - Interpretability Analysis for Named Entity Recognition to Understand
System Predictions and How They Can Improve [49.878051587667244]
We examine the performance of several variants of LSTM-CRF architectures for named entity recognition.
We find that context representations do contribute to system performance, but that the main factor driving high performance is learning the name tokens themselves.
We enlist human annotators to evaluate the feasibility of inferring entity types from the context alone and find that, while people are not able to infer the entity type either for the majority of the errors made by the context-only system, there is some room for improvement.
arXiv Detail & Related papers (2020-04-09T14:37:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.