UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical
Semantic Change Detection
- URL: http://arxiv.org/abs/2005.00050v3
- Date: Sun, 19 Jul 2020 01:44:32 GMT
- Title: UiO-UvA at SemEval-2020 Task 1: Contextualised Embeddings for Lexical
Semantic Change Detection
- Authors: Andrey Kutuzov and Mario Giulianelli
- Abstract summary: This paper focuses on Subtask 2, ranking words by the degree of their semantic drift over time.
We find that the most effective algorithms rely on the cosine similarity between averaged token embeddings and the pairwise distances between token embeddings.
- Score: 5.099262949886174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We apply contextualised word embeddings to lexical semantic change detection
in the SemEval-2020 Shared Task 1. This paper focuses on Subtask 2, ranking
words by the degree of their semantic drift over time. We analyse the
performance of two contextualising architectures (BERT and ELMo) and three
change detection algorithms. We find that the most effective algorithms rely on
the cosine similarity between averaged token embeddings and the pairwise
distances between token embeddings. They outperform strong baselines by a large
margin (in the post-evaluation phase, we have the best Subtask 2 submission for
SemEval-2020 Task 1), but interestingly, the choice of a particular algorithm
depends on the distribution of gold scores in the test set.
Related papers
- Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective [50.261681681643076]
We propose a novel metric called SemVarEffect and a benchmark named SemVarBench to evaluate the causality between semantic variations in inputs and outputs in text-to-image synthesis.
Our work establishes an effective evaluation framework that advances the T2I synthesis community's exploration of human instruction understanding.
arXiv Detail & Related papers (2024-10-14T08:45:35Z) - Lexically Grounded Subword Segmentation [0.0]
We present three innovations in tokenization and subword segmentation.
First, we propose to use unsupervised morphological analysis with Morfessor as pre-tokenization.
Second, we present an method for obtaining subword embeddings grounded in a word embedding space.
Third, we introduce an efficient segmentation algorithm based on a subword bigram model.
arXiv Detail & Related papers (2024-06-19T13:48:19Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on
Spoken Language Understanding [101.24748444126982]
Decomposable tasks are complex and comprise of a hierarchy of sub-tasks.
Existing benchmarks, however, typically hold out examples for only the surface-level sub-task.
We propose a framework to construct robust test sets using coordinate ascent over sub-task specific utility functions.
arXiv Detail & Related papers (2021-06-29T02:53:59Z) - Cross-domain Speech Recognition with Unsupervised Character-level
Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains.
Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z) - BRUMS at SemEval-2020 Task 3: Contextualised Embeddings for Predicting
the (Graded) Effect of Context in Word Similarity [9.710464466895521]
This paper presents the team BRUMS submission to SemEval-2020 Task 3: Graded Word Similarity in Context.
The system utilise state-of-the-art contextualised word embeddings, which have some task-specific adaptations, including stacked embeddings and average embeddings.
Following the final rankings, our approach is ranked within the top 5 solutions of each language while preserving the 1st position of Finnish subtask 2.
arXiv Detail & Related papers (2020-10-13T10:25:18Z) - SST-BERT at SemEval-2020 Task 1: Semantic Shift Tracing by Clustering in
BERT-based Embedding Spaces [63.17308641484404]
We propose to identify clusters among different occurrences of each target word, considering these as representatives of different word meanings.
Disagreements in obtained clusters naturally allow to quantify the level of semantic shift per each target word in four target languages.
Our approach performs well both measured separately (per language) and overall, where we surpass all provided SemEval baselines.
arXiv Detail & Related papers (2020-10-02T08:38:40Z) - SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection [10.606357227329822]
Evaluation is currently the most pressing problem in Lexical Semantic Change detection.
No gold standards are available to the community, which hinders progress.
We present the results of the first shared task that addresses this gap.
arXiv Detail & Related papers (2020-07-22T14:37:42Z) - GloVeInit at SemEval-2020 Task 1: Using GloVe Vector Initialization for
Unsupervised Lexical Semantic Change Detection [0.0]
This paper presents a Vector Initialization approach for the SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection.
The proposed approach is based on using Vector Initialization method to align GloVe embeddings.
Our model ranks 13th and 10th among 33 teams in the two subtasks.
arXiv Detail & Related papers (2020-07-10T21:35:17Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.