Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked
Features, Delivered Fresh from the Oven
- URL: http://arxiv.org/abs/2109.02383v1
- Date: Mon, 6 Sep 2021 12:00:29 GMT
- Title: Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked
Features, Delivered Fresh from the Oven
- Authors: Niclas Hildebrandt and Benedikt Boenninghoff and Dennis Orth and
Christopher Schymura
- Abstract summary: This paper presents the contribution of the Data Science Kitchen at GermEval 2021 on the identification of toxic, engaging, and fact-claiming comments.
We combine semantic and writing style embeddings derived from pre-trained deep neural networks with additional numerical features, specifically designed for this task.
Our best submission achieved macro-averaged F1-scores of 66.8%, 69.9% and 72.5% for the identification of toxic, engaging, and fact-claiming comments.
- Score: 4.435835732946953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents the contribution of the Data Science Kitchen at GermEval
2021 shared task on the identification of toxic, engaging, and fact-claiming
comments. The task aims at extending the identification of offensive language,
by including additional subtasks that identify comments which should be
prioritized for fact-checking by moderators and community managers. Our
contribution focuses on a feature-engineering approach with a conventional
classification backend. We combine semantic and writing style embeddings
derived from pre-trained deep neural networks with additional numerical
features, specifically designed for this task. Ensembles of Logistic Regression
classifiers and Support Vector Machines are used to derive predictions for each
subtask via a majority voting scheme. Our best submission achieved
macro-averaged F1-scores of 66.8%, 69.9% and 72.5% for the identification of
toxic, engaging, and fact-claiming comments.
Related papers
- Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task.
The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages.
Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z) - Task-Specific Embeddings for Ante-Hoc Explainable Text Classification [6.671252951387647]
We propose an alternative training objective in which we learn task-specific embeddings of text.
Our proposed objective learns embeddings such that all texts that share the same target class label should be close together.
We present extensive experiments which show that the benefits of ante-hoc explainability and incremental learning come at no cost in overall classification accuracy.
arXiv Detail & Related papers (2022-11-30T19:56:25Z) - Association Graph Learning for Multi-Task Classification with Category
Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously.
We learn an association graph to transfer knowledge among tasks for missing classes.
Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z) - BEIKE NLP at SemEval-2022 Task 4: Prompt-Based Paragraph Classification
for Patronizing and Condescending Language Detection [13.944149742291788]
PCL detection task is aimed at identifying language that is patronizing or condescending towards vulnerable communities in the general media.
In this paper, we give an introduction to our solution, which exploits the power of prompt-based learning on paragraph classification.
arXiv Detail & Related papers (2022-08-02T08:38:47Z) - DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse
Segmentation, Classification, and Connective Detection [4.371388370559826]
Our system, called DisCoDisCo, enhances contextualized word embeddings with hand-crafted features.
Results on relation classification suggest strong performance on the new 2021 benchmark.
A partial evaluation of multiple pre-trained Transformer-based language models indicates that models pre-trained on the Next Sentence Prediction task are optimal for relation classification.
arXiv Detail & Related papers (2021-09-20T18:11:05Z) - Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z) - CIM: Class-Irrelevant Mapping for Few-Shot Classification [58.02773394658623]
Few-shot classification (FSC) is one of the most concerned hot issues in recent years.
How to appraise the pre-trained FEM is the most crucial focus in the FSC community.
We propose a simple, flexible method, dubbed as Class-Irrelevant Mapping (CIM)
arXiv Detail & Related papers (2021-09-07T03:26:24Z) - A survey of joint intent detection and slot-filling models in natural
language understanding [0.0]
This article is a compilation of past work in natural language understanding, especially joint intent classification and slot filling.
In this article, we describe trends, approaches, issues, data sets, evaluation metrics in intent classification and slot filling.
arXiv Detail & Related papers (2021-01-20T12:15:11Z) - Out-distribution aware Self-training in an Open World Setting [62.19882458285749]
We leverage unlabeled data in an open world setting to further improve prediction performance.
We introduce out-distribution aware self-training, which includes a careful sample selection strategy.
Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
arXiv Detail & Related papers (2020-12-21T12:25:04Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.