FHAC at GermEval 2021: Identifying German toxic, engaging, and
fact-claiming comments with ensemble learning
- URL: http://arxiv.org/abs/2109.03094v1
- Date: Tue, 7 Sep 2021 13:52:39 GMT
- Title: FHAC at GermEval 2021: Identifying German toxic, engaging, and
fact-claiming comments with ensemble learning
- Authors: Tobias Bornheim, Niklas Grieger, Stephan Bialonski
- Abstract summary: We fine-tuned German BERT and German ELECTRA models to identify toxic (subtask 1), engaging (subtask 2), and fact-claiming comments (subtask 3) in Facebook data provided by the GermEval 2021 competition.
Our best ensemble achieved a macro-F1 score of 0.73 (for all subtasks), and F1 scores of 0.72, 0.70, and 0.76 for subtasks 1, 2, and 3, respectively.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The availability of language representations learned by large pretrained
neural network models (such as BERT and ELECTRA) has led to improvements in
many downstream Natural Language Processing tasks in recent years. Pretrained
models usually differ in pretraining objectives, architectures, and datasets
they are trained on which can affect downstream performance. In this
contribution, we fine-tuned German BERT and German ELECTRA models to identify
toxic (subtask 1), engaging (subtask 2), and fact-claiming comments (subtask 3)
in Facebook data provided by the GermEval 2021 competition. We created
ensembles of these models and investigated whether and how classification
performance depends on the number of ensemble members and their composition. On
out-of-sample data, our best ensemble achieved a macro-F1 score of 0.73 (for
all subtasks), and F1 scores of 0.72, 0.70, and 0.76 for subtasks 1, 2, and 3,
respectively.
Related papers
- An Open Dataset and Model for Language Identification [84.15194457400253]
We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages.
We make both the model and the dataset available to the research community.
arXiv Detail & Related papers (2023-05-23T08:43:42Z) - Bag of Tricks for Effective Language Model Pretraining and Downstream
Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard.
GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - Exploring the Value of Pre-trained Language Models for Clinical Named
Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs.
We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z) - UU-Tax at SemEval-2022 Task 3: Improving the generalizability of
language models for taxonomy classification through data augmentation [0.0]
This paper addresses the SemEval-2022 Task 3 PreTENS: Presupposed Taxonomies evaluating Neural Network Semantics.
The goal of the task is to identify if a sentence is deemed acceptable or not, depending on the taxonomic relationship that holds between a noun pair contained in the sentence.
We propose an effective way to enhance the robustness and the generalizability of language models for better classification.
arXiv Detail & Related papers (2022-10-07T07:41:28Z) - ANNA: Enhanced Language Representation for Question Answering [5.713808202873983]
We show how approaches affect performance individually and that the approaches are jointly considered in pre-training models.
We propose an extended pre-training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.
Our best model achieves new state-of-the-art results of 95.7% F1 and 90.6% EM on SQuAD 1.1 and also outperforms existing pre-trained language models such as RoBERTa, ALBERT, ELECTRA, and XLNet.
arXiv Detail & Related papers (2022-03-28T05:26:52Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z) - FH-SWF SG at GermEval 2021: Using Transformer-Based Language Models to
Identify Toxic, Engaging, & Fact-Claiming Comments [0.0]
We describe the methods we used for our submissions to the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments.
For all three subtasks we fine-tuned freely available transformer-based models from the Huggingface model hub.
We evaluated the performance of various pre-trained models after fine-tuning on 80% of the training data and submitted predictions of the two best performing resulting models.
arXiv Detail & Related papers (2021-09-07T09:46:27Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving
Out-of-Domain Robustness [66.37077266814822]
In natural language, it is difficult to generate new examples that stay on the underlying data manifold.
We introduce SSMBA, a data augmentation method for generating synthetic training examples.
In experiments on benchmarks across 3 tasks and 9 datasets, SSMBA consistently outperforms existing data augmentation methods.
arXiv Detail & Related papers (2020-09-21T22:02:33Z) - FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings [2.362412515574206]
In this paper, we present our approach for sentiment classification on Spanish-English code-mixed social media data.
We explore both monolingual and multilingual models with the standard fine-tuning method.
Although two-step fine-tuning improves sentiment classification performance over the base model, the large multilingual XLM-RoBERTa model achieves best weighted F1-score.
arXiv Detail & Related papers (2020-07-24T14:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.