Related papers: Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven

Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven

URL: http://arxiv.org/abs/2109.02383v1
Date: Mon, 6 Sep 2021 12:00:29 GMT
Title: Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven
Authors: Niclas Hildebrandt and Benedikt Boenninghoff and Dennis Orth and Christopher Schymura
Abstract summary: This paper presents the contribution of the Data Science Kitchen at GermEval 2021 on the identification of toxic, engaging, and fact-claiming comments. We combine semantic and writing style embeddings derived from pre-trained deep neural networks with additional numerical features, specifically designed for this task. Our best submission achieved macro-averaged F1-scores of 66.8%, 69.9% and 72.5% for the identification of toxic, engaging, and fact-claiming comments.
Score: 4.435835732946953
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents the contribution of the Data Science Kitchen at GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. The task aims at extending the identification of offensive language, by including additional subtasks that identify comments which should be prioritized for fact-checking by moderators and community managers. Our contribution focuses on a feature-engineering approach with a conventional classification backend. We combine semantic and writing style embeddings derived from pre-trained deep neural networks with additional numerical features, specifically designed for this task. Ensembles of Logistic Regression classifiers and Support Vector Machines are used to derive predictions for each subtask via a majority voting scheme. Our best submission achieved macro-averaged F1-scores of 66.8%, 69.9% and 72.5% for the identification of toxic, engaging, and fact-claiming comments.

Related papers

Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation [92.1582872870226]
We propose a new grounded keys-to-text generation task. The task is to generate a factual description about an entity given a set of guiding keys, and grounding passages. Inspired by recent QA-based evaluation measures, we propose an automatic metric, MAFE, for factual correctness of generated descriptions.
arXiv Detail & Related papers (2022-12-04T23:59:41Z)
Task-Specific Embeddings for Ante-Hoc Explainable Text Classification [6.671252951387647]
We propose an alternative training objective in which we learn task-specific embeddings of text. Our proposed objective learns embeddings such that all texts that share the same target class label should be close together. We present extensive experiments which show that the benefits of ante-hoc explainability and incremental learning come at no cost in overall classification accuracy.
arXiv Detail & Related papers (2022-11-30T19:56:25Z)
Association Graph Learning for Multi-Task Classification with Category Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously. We learn an association graph to transfer knowledge among tasks for missing classes. Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z)
BEIKE NLP at SemEval-2022 Task 4: Prompt-Based Paragraph Classification for Patronizing and Condescending Language Detection [13.944149742291788]
PCL detection task is aimed at identifying language that is patronizing or condescending towards vulnerable communities in the general media. In this paper, we give an introduction to our solution, which exploits the power of prompt-based learning on paragraph classification.
arXiv Detail & Related papers (2022-08-02T08:38:47Z)
DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection [4.371388370559826]
Our system, called DisCoDisCo, enhances contextualized word embeddings with hand-crafted features. Results on relation classification suggest strong performance on the new 2021 benchmark. A partial evaluation of multiple pre-trained Transformer-based language models indicates that models pre-trained on the Next Sentence Prediction task are optimal for relation classification.
arXiv Detail & Related papers (2021-09-20T18:11:05Z)
Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified. The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z)
CIM: Class-Irrelevant Mapping for Few-Shot Classification [58.02773394658623]
Few-shot classification (FSC) is one of the most concerned hot issues in recent years. How to appraise the pre-trained FEM is the most crucial focus in the FSC community. We propose a simple, flexible method, dubbed as Class-Irrelevant Mapping (CIM)
arXiv Detail & Related papers (2021-09-07T03:26:24Z)
A survey of joint intent detection and slot-filling models in natural language understanding [0.0]
This article is a compilation of past work in natural language understanding, especially joint intent classification and slot filling. In this article, we describe trends, approaches, issues, data sets, evaluation metrics in intent classification and slot filling.
arXiv Detail & Related papers (2021-01-20T12:15:11Z)
Out-distribution aware Self-training in an Open World Setting [62.19882458285749]
We leverage unlabeled data in an open world setting to further improve prediction performance. We introduce out-distribution aware self-training, which includes a careful sample selection strategy. Our classifiers are by design out-distribution aware and can thus distinguish task-related inputs from unrelated ones.
arXiv Detail & Related papers (2020-12-21T12:25:04Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.