aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF,
and Transfer Learning
- URL: http://arxiv.org/abs/2008.02837v1
- Date: Thu, 6 Aug 2020 18:45:25 GMT
- Title: aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF,
and Transfer Learning
- Authors: Anton Chernyavskiy, Dmitry Ilvovsky, Preslav Nakov
- Abstract summary: We describe our system for SemEval-2020 Task 11 on Detection of Propaganda Techniques in News Articles.
We developed ensemble models using RoBERTa-based neural architectures, additional CRF layers, transfer learning between the two subtasks, and advanced post-processing to handle the multi-label nature of the task.
- Score: 22.90521056447551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe our system for SemEval-2020 Task 11 on Detection of Propaganda
Techniques in News Articles. We developed ensemble models using RoBERTa-based
neural architectures, additional CRF layers, transfer learning between the two
subtasks, and advanced post-processing to handle the multi-label nature of the
task, the consistency between nested spans, repetitions, and labels from
similar spans in training. We achieved sizable improvements over baseline
fine-tuned RoBERTa models, and the official evaluation ranked our system 3rd
(almost tied with the 2nd) out of 36 teams on the span identification subtask
with an F1 score of 0.491, and 2nd (almost tied with the 1st) out of 31 teams
on the technique classification subtask with an F1 score of 0.62.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - GersteinLab at MEDIQA-Chat 2023: Clinical Note Summarization from
Doctor-Patient Conversations through Fine-tuning and In-context Learning [4.2570830892708225]
This paper presents our contribution to the MEDIQA-2023 Dialogue2Note shared task, encompassing both subtask A and subtask B.
We approach the task as a dialogue summarization problem and implement two distinct pipelines: (a) a fine-tuning of a pre-trained dialogue summarization model and GPT-3, and (b) few-shot in-context learning (ICL) using a large language model, GPT-4.
Both methods achieve excellent results in terms of ROUGE-1 F1, BERTScore F1 (deberta-xlarge-mnli), and BLEURT
arXiv Detail & Related papers (2023-05-08T19:16:26Z) - Detecting Generated Scientific Papers using an Ensemble of Transformer
Models [4.56877715768796]
The paper describes neural models developed for the DAGPap22 shared task hosted at the Third Workshop on Scholarly Document Processing.
Our work focuses on comparing different transformer-based models as well as using additional datasets and techniques to deal with imbalanced classes.
arXiv Detail & Related papers (2022-09-17T08:43:25Z) - Combining Modular Skills in Multitask Learning [149.8001096811708]
A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks.
In this work, we assume each task is associated with a subset of latent discrete skills from a (potentially small) inventory.
We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning.
arXiv Detail & Related papers (2022-02-28T16:07:19Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Solomon at SemEval-2020 Task 11: Ensemble Architecture for Fine-Tuned
Propaganda Detection in News Articles [0.3232625980782302]
This paper describes our system (Solomon) details and results of participation in the SemEval 2020 Task 11 "Detection of Propaganda Techniques in News Articles"
We used RoBERTa based transformer architecture for fine-tuning on the propaganda dataset.
Compared to the other participating systems, our submission is ranked 4th on the leaderboard.
arXiv Detail & Related papers (2020-09-16T05:00:40Z) - syrapropa at SemEval-2020 Task 11: BERT-based Models Design For
Propagandistic Technique and Span Detection [2.0051855303186046]
We first build the model for Span Identification (SI) based on SpanBERT, and facilitate the detection by a deeper model and a sentence-level representation.
We then develop a hybrid model for the Technique Classification (TC)
The hybrid model is composed of three submodels including two BERT models with different training methods, and a feature-based Logistic Regression model.
arXiv Detail & Related papers (2020-08-24T02:15:29Z) - CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering
for Ensemble Models for Propaganda Detection [0.0]
We use a bi-LSTM architecture in the Span Identification subtask and train a complex ensemble model for the Technique Classification subtask.
Our systems achieve a rank of 8 out of 35 teams in the SI subtask and 8 out of 31 teams in the TC subtask.
arXiv Detail & Related papers (2020-08-22T15:51:16Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.