MULTISEM at SemEval-2020 Task 3: Fine-tuning BERT for Lexical Meaning
- URL: http://arxiv.org/abs/2007.12432v1
- Date: Fri, 24 Jul 2020 09:50:26 GMT
- Title: MULTISEM at SemEval-2020 Task 3: Fine-tuning BERT for Lexical Meaning
- Authors: Aina Gar\'i Soler, Marianna Apidianaki
- Abstract summary: We present the MULTISEM systems submitted to SemEval 2020 Task 3: Graded Word Similarity in Context (GWSC)
We experiment with injecting semantic knowledge into pre-trained BERT models through fine-tuning on lexical semantic tasks related to GWSC.
We use existing semantically annotated datasets and propose to approximate similarity through automatically generated lexical substitutes in context.
- Score: 6.167728295758172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the MULTISEM systems submitted to SemEval 2020 Task 3: Graded Word
Similarity in Context (GWSC). We experiment with injecting semantic knowledge
into pre-trained BERT models through fine-tuning on lexical semantic tasks
related to GWSC. We use existing semantically annotated datasets and propose to
approximate similarity through automatically generated lexical substitutes in
context. We participate in both GWSC subtasks and address two languages,
English and Finnish. Our best English models occupy the third and fourth
positions in the ranking for the two subtasks. Performance is lower for the
Finnish models which are mid-ranked in the respective subtasks, highlighting
the important role of data availability for fine-tuning.
Related papers
- Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks [1.8124328823188356]
We present an automated framework to decompose trajectory data into temporally bounded and natural language-based descriptive sub-tasks.
Our framework provides both time-based and language-based descriptions for lower-level sub-tasks that comprise full trajectories.
The metrics measure the temporal alignment and semantic fidelity of language descriptions between two sub-task decompositions.
arXiv Detail & Related papers (2024-03-25T22:39:20Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Seq2Seq-SC: End-to-End Semantic Communication Systems with Pre-trained
Language Model [20.925910474226885]
We propose a realistic semantic network called seq2seq-SC, designed to be compatible with 5G NR.
We employ a performance metric called semantic similarity, measured by BLEU for lexical similarity and SBERT for semantic similarity.
arXiv Detail & Related papers (2022-10-27T07:48:18Z) - Alibaba-Translate China's Submission for WMT 2022 Quality Estimation
Shared Task [80.22825549235556]
We present our submission to the sentence-level MQM benchmark at Quality Estimation Shared Task, named UniTE.
Specifically, our systems employ the framework of UniTE, which combined three types of input formats during training with a pre-trained language model.
Results show that our models reach 1st overall ranking in the Multilingual and English-Russian settings, and 2nd overall ranking in English-German and Chinese-English settings.
arXiv Detail & Related papers (2022-10-18T08:55:27Z) - LRG at SemEval-2021 Task 4: Improving Reading Comprehension with
Abstract Words using Augmentation, Linguistic Features and Voting [0.6850683267295249]
Given a fill-in-the-blank-type question, the task is to predict the most suitable word from a list of 5 options.
We use encoders of transformers-based models pre-trained on the masked language modelling (MLM) task to build our Fill-in-the-blank (FitB) models.
We propose variants, namely Chunk Voting and Max Context, to take care of input length restrictions for BERT, etc.
arXiv Detail & Related papers (2021-02-24T12:33:12Z) - BRUMS at SemEval-2020 Task 3: Contextualised Embeddings for Predicting
the (Graded) Effect of Context in Word Similarity [9.710464466895521]
This paper presents the team BRUMS submission to SemEval-2020 Task 3: Graded Word Similarity in Context.
The system utilise state-of-the-art contextualised word embeddings, which have some task-specific adaptations, including stacked embeddings and average embeddings.
Following the final rankings, our approach is ranked within the top 5 solutions of each language while preserving the 1st position of Finnish subtask 2.
arXiv Detail & Related papers (2020-10-13T10:25:18Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing [117.98107557103877]
We present GraPPa, an effective pre-training approach for table semantic parsing.
We construct synthetic question-pairs over high-free tables via a synchronous context-free grammar.
To maintain the model's ability to represent real-world data, we also include masked language modeling.
arXiv Detail & Related papers (2020-09-29T08:17:58Z) - BUT-FIT at SemEval-2020 Task 4: Multilingual commonsense [1.433758865948252]
This paper describes work of the BUT-FIT's team at SemEval 2020 Task 4 - Commonsense Validation and Explanation.
In subtasks A and B, our submissions are based on pretrained language representation models (namely ALBERT) and data augmentation.
We experimented with solving the task for another language, Czech, by means of multilingual models and machine translated dataset.
We show that with a strong machine translation system, our system can be used in another language with a small accuracy loss.
arXiv Detail & Related papers (2020-08-17T12:45:39Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.