Capturing Evolution in Word Usage: Just Add More Clusters?
- URL: http://arxiv.org/abs/2001.06629v2
- Date: Fri, 24 Jan 2020 01:58:05 GMT
- Title: Capturing Evolution in Word Usage: Just Add More Clusters?
- Authors: Matej Martinc, Syrielle Montariol, Elaine Zosa and Lidia Pivovarova
- Abstract summary: We focus on a new set of methods relying on contextualised embeddings, a type of semantic modelling that revolutionised the NLP field recently.
We leverage the ability of the transformer-based BERT model to generate contextualised embeddings capable of detecting semantic change of words across time.
- Score: 9.209873675320834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The way the words are used evolves through time, mirroring cultural or
technological evolution of society. Semantic change detection is the task of
detecting and analysing word evolution in textual data, even in short periods
of time. In this paper we focus on a new set of methods relying on
contextualised embeddings, a type of semantic modelling that revolutionised the
NLP field recently. We leverage the ability of the transformer-based BERT model
to generate contextualised embeddings capable of detecting semantic change of
words across time. Several approaches are compared in a common setting in order
to establish strengths and weaknesses for each of them. We also propose several
ideas for improvements, managing to drastically improve the performance of
existing approaches.
Related papers
- Survey in Characterization of Semantic Change [0.1474723404975345]
Understanding the meaning of words is vital for interpreting texts from different cultures.
Semantic changes can potentially impact the quality of the outcomes of computational linguistics algorithms.
arXiv Detail & Related papers (2024-02-29T12:13:50Z) - Graph-based Clustering for Detecting Semantic Change Across Time and
Languages [10.058655884092094]
We propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages.
Our approach substantially surpasses previous approaches in the SemEval 2020 binary classification task across four languages.
arXiv Detail & Related papers (2024-02-01T21:27:19Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text
Generation [65.29170569821093]
parallel text generation has received widespread attention due to its success in generation efficiency.
In this paper, we propose $textitlatent$-GLAT, which employs the discrete latent variables to capture word categorical information.
Experiment results show that our method outperforms strong baselines without the help of an autoregressive model.
arXiv Detail & Related papers (2022-04-05T07:34:12Z) - To Augment or Not to Augment? A Comparative Study on Text Augmentation
Techniques for Low-Resource NLP [0.0]
We investigate three categories of text augmentation methodologies which perform changes on the syntax.
We compare them on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families.
Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT.
arXiv Detail & Related papers (2021-11-18T10:52:48Z) - Lexical Semantic Change Discovery [22.934650688233734]
We propose a shift from change detection to change discovery, i.e., discovering novel word senses over time from the full corpus vocabulary.
By heavily fine-tuning a type-based and a token-based approach on recently published German data, we demonstrate that both models can successfully be applied to discover new words undergoing meaning change.
arXiv Detail & Related papers (2021-06-06T13:02:38Z) - Meta-Learning with Variational Semantic Memory for Word Sense
Disambiguation [56.830395467247016]
We propose a model of semantic memory for WSD in a meta-learning setting.
Our model is based on hierarchical variational inference and incorporates an adaptive memory update rule via a hypernetwork.
We show our model advances the state of the art in few-shot WSD, supports effective learning in extremely data scarce scenarios.
arXiv Detail & Related papers (2021-06-05T20:40:01Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Lexical semantic change for Ancient Greek and Latin [61.69697586178796]
Associating a word's correct meaning in its historical context is a central challenge in diachronic research.
We build on a recent computational approach to semantic change based on a dynamic Bayesian mixture model.
We provide a systematic comparison of dynamic Bayesian mixture models for semantic change with state-of-the-art embedding-based models.
arXiv Detail & Related papers (2021-01-22T12:04:08Z) - Improving Adversarial Text Generation by Modeling the Distant Future [155.83051741029732]
We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues.
We propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization.
arXiv Detail & Related papers (2020-05-04T05:45:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.