Analysing Lexical Semantic Change with Contextualised Word
Representations
- URL: http://arxiv.org/abs/2004.14118v1
- Date: Wed, 29 Apr 2020 12:18:14 GMT
- Title: Analysing Lexical Semantic Change with Contextualised Word
Representations
- Authors: Mario Giulianelli, Marco Del Tredici, Raquel Fern\'andez
- Abstract summary: We propose a novel method that exploits the BERT neural language model to obtain representations of word usages.
We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements.
- Score: 7.071298726856781
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents the first unsupervised approach to lexical semantic
change that makes use of contextualised word representations. We propose a
novel method that exploits the BERT neural language model to obtain
representations of word usages, clusters these representations into usage
types, and measures change along time with three proposed metrics. We create a
new evaluation dataset and show that the model representations and the detected
semantic shifts are positively correlated with human judgements. Our extensive
qualitative analysis demonstrates that our method captures a variety of
synchronic and diachronic linguistic phenomena. We expect our work to inspire
further research in this direction.
Related papers
- How well do distributed representations convey contextual lexical semantics: a Thesis Proposal [3.3585951129432323]
In this thesis, we examine the efficacy of distributed representations from modern neural networks in encoding lexical meaning.
We identify four sources of ambiguity based on the relatedness and similarity of meanings influenced by context.
We then aim to evaluate these sources by collecting or constructing multilingual datasets, leveraging various language models, and employing linguistic analysis tools.
arXiv Detail & Related papers (2024-06-02T14:08:51Z) - A Comprehensive Empirical Evaluation of Existing Word Embedding
Approaches [5.065947993017158]
We present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks.
Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well.
On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations.
arXiv Detail & Related papers (2023-03-13T15:34:19Z) - Unsupervised Lexical Substitution with Decontextualised Embeddings [48.00929769805882]
We propose a new unsupervised method for lexical substitution using pre-trained language models.
Our method retrieves substitutes based on the similarity of contextualised and decontextualised word embeddings.
We conduct experiments in English and Italian, and show that our method substantially outperforms strong baselines.
arXiv Detail & Related papers (2022-09-17T03:51:47Z) - Contextualized language models for semantic change detection: lessons
learned [4.436724861363513]
We present a qualitative analysis of the outputs of contextualized embedding-based methods for detecting diachronic semantic change.
Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift.
Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance.
arXiv Detail & Related papers (2022-08-31T23:35:24Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Discrete representations in neural models of spoken language [56.29049879393466]
We compare the merits of four commonly used metrics in the context of weakly supervised models of spoken language.
We find that the different evaluation metrics can give inconsistent results.
arXiv Detail & Related papers (2021-05-12T11:02:02Z) - Multi-sense embeddings through a word sense disambiguation process [2.2344764434954256]
Most Suitable Sense.
(MSSA) disambiguates and annotates each word by its specific sense, considering the semantic effects of its context.
We test our approach on six different benchmarks for the word similarity task, showing that our approach can produce state-of-the-art results.
arXiv Detail & Related papers (2021-01-21T16:22:34Z) - SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical
Semantic Change [58.87961226278285]
This paper describes SChME, a method used in SemEval-2020 Task 1 on unsupervised detection of lexical semantic change.
SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature.
arXiv Detail & Related papers (2020-12-02T23:56:34Z) - VCDM: Leveraging Variational Bi-encoding and Deep Contextualized Word
Representations for Improved Definition Modeling [24.775371434410328]
We tackle the task of definition modeling, where the goal is to learn to generate definitions of words and phrases.
Existing approaches for this task are discriminative, combining distributional and lexical semantics in an implicit rather than direct way.
We propose a generative model for the task, introducing a continuous latent variable to explicitly model the underlying relationship between a phrase used within a context and its definition.
arXiv Detail & Related papers (2020-10-07T02:48:44Z) - Temporal Embeddings and Transformer Models for Narrative Text
Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling.
The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time.
A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.