NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for
Lexical Semantic Change in Diachronic Italian Corpora
- URL: http://arxiv.org/abs/2011.03755v1
- Date: Sat, 7 Nov 2020 11:27:18 GMT
- Title: NLP-CIC @ DIACR-Ita: POS and Neighbor Based Distributional Models for
Lexical Semantic Change in Diachronic Italian Corpora
- Authors: Jason Angel, Carlos A. Rodriguez-Diaz, Alexander Gelbukh, Sergio
Jimenez
- Abstract summary: We present our systems and findings on unsupervised lexical semantic change for the Italian language.
The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets.
We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes.
- Score: 62.997667081978825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present our systems and findings on unsupervised lexical semantic change
for the Italian language in the DIACR-Ita shared-task at EVALITA 2020. The task
is to determine whether a target word has evolved its meaning with time, only
relying on raw-text from two time-specific datasets. We propose two models
representing the target words across the periods to predict the changing words
using threshold and voting schemes. Our first model solely relies on
part-of-speech usage and an ensemble of distance measures. The second model
uses word embedding representation to extract the neighbor's relative distances
across spaces and propose "the average of absolute differences" to estimate
lexical semantic change. Our models achieved competent results, ranking third
in the DIACR-Ita competition. Furthermore, we experiment with the k_neighbor
parameter of our second model to compare the impact of using "the average of
absolute differences" versus the cosine distance used in Hamilton et al.
(2016).
Related papers
- Semantic Change Detection for the Romanian Language [0.5202524136984541]
We analyze different strategies to create static and contextual word embedding models on real-world datasets.
We first evaluate both word embedding models on an English dataset (SEMEVAL-CCOHA) and then on a Romanian dataset.
The experimental results show that, depending on the corpus, the most important factors to consider are the choice of model and the distance to calculate a score for detecting semantic change.
arXiv Detail & Related papers (2023-08-23T13:37:02Z) - Unsupervised Semantic Variation Prediction using the Distribution of
Sibling Embeddings [17.803726860514193]
Detection of semantic variation of words is an important task for various NLP applications.
We argue that mean representations alone cannot accurately capture such semantic variations.
We propose a method that uses the entire cohort of the contextualised embeddings of the target word.
arXiv Detail & Related papers (2023-05-15T13:58:21Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - SChME at SemEval-2020 Task 1: A Model Ensemble for Detecting Lexical
Semantic Change [58.87961226278285]
This paper describes SChME, a method used in SemEval-2020 Task 1 on unsupervised detection of lexical semantic change.
SChME usesa model ensemble combining signals of distributional models (word embeddings) and wordfrequency models where each model casts a vote indicating the probability that a word sufferedsemantic change according to that feature.
arXiv Detail & Related papers (2020-12-02T23:56:34Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Binary and Multitask Classification Model for Dutch Anaphora Resolution:
Die/Dat Prediction [18.309099448064273]
correct use of Dutch pronouns 'die' and 'dat' is a stumbling block for both native and non-native speakers of Dutch.
This study constructs the first neural network model for Dutch demonstrative and relative pronoun resolution.
arXiv Detail & Related papers (2020-01-09T12:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.