IITD at the WANLP 2022 Shared Task: Multilingual Multi-Granularity
Network for Propaganda Detection
- URL: http://arxiv.org/abs/2210.17190v1
- Date: Mon, 31 Oct 2022 10:14:43 GMT
- Title: IITD at the WANLP 2022 Shared Task: Multilingual Multi-Granularity
Network for Propaganda Detection
- Authors: Shubham Mittal and Preslav Nakov
- Abstract summary: We present our system for two subtasks of the shared task on propaganda detection in Arabic.
Subtask 1 is a multi-label classification problem to find the propaganda techniques used in a given tweet.
Subtask 2 asks to identify the textual span for each instance of each technique that is present in the tweet.
- Score: 25.536546272915427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present our system for the two subtasks of the shared task on propaganda
detection in Arabic, part of WANLP'2022. Subtask 1 is a multi-label
classification problem to find the propaganda techniques used in a given tweet.
Our system for this task uses XLM-R to predict probabilities for the target
tweet to use each of the techniques. In addition to finding the techniques,
Subtask 2 further asks to identify the textual span for each instance of each
technique that is present in the tweet; the task can be modeled as a sequence
tagging problem. We use a multi-granularity network with mBERT encoder for
Subtask 2. Overall, our system ranks second for both subtasks (out of 14 and 3
participants, respectively). Our empirical analysis show that it does not help
to use a much larger English corpus annotated with propaganda techniques,
regardless of whether used in English or after translation to Arabic.
Related papers
- SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection [68.858931667807]
Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine.
Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM.
Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine.
arXiv Detail & Related papers (2024-04-22T13:56:07Z) - CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task
Information Retrieval [5.97515243922116]
We present the Charles University system for the MRL2023 Shared Task on Multi-lingual Multi-task Information Retrieval.
The goal of the shared task was to develop systems for named entity recognition and question answering in several under-represented languages.
Our solutions to both subtasks rely on the translate-test approach.
arXiv Detail & Related papers (2023-10-25T10:22:49Z) - DAMO-NLP at SemEval-2023 Task 2: A Unified Retrieval-augmented System
for Multilingual Named Entity Recognition [94.90258603217008]
The MultiCoNER RNum2 shared task aims to tackle multilingual named entity recognition (NER) in fine-grained and noisy scenarios.
Previous top systems in the MultiCoNER RNum1 either incorporate the knowledge bases or gazetteers.
We propose a unified retrieval-augmented system (U-RaNER) for fine-grained multilingual NER.
arXiv Detail & Related papers (2023-05-05T16:59:26Z) - Findings of the WMT 2022 Shared Task on Translation Suggestion [63.457874930232926]
We report the result of the first edition of the WMT shared task on Translation Suggestion.
The task aims to provide alternatives for specific words or phrases given the entire documents generated by machine translation (MT)
It consists two sub-tasks, namely, the naive translation suggestion and translation suggestion with hints.
arXiv Detail & Related papers (2022-11-30T03:48:36Z) - Overview of the WANLP 2022 Shared Task on Propaganda Detection in Arabic [32.27059493109764]
We ran a task on detecting propaganda techniques in Arabic tweets as part of the WANLP 2022 workshop.
Subtask1 asks to identify the set of propaganda techniques used in a tweet, which is a multilabel classification problem.
Subtask2 asks to detect the propaganda techniques used in a tweet together with the exact span(s) of text in which each propaganda technique appears.
arXiv Detail & Related papers (2022-11-18T07:04:31Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification [70.1903083747775]
This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
arXiv Detail & Related papers (2020-08-11T16:14:47Z) - CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot
Cross-Lingual NLP [68.2650714613869]
We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT.
Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.
arXiv Detail & Related papers (2020-06-11T13:15:59Z) - BPGC at SemEval-2020 Task 11: Propaganda Detection in News Articles with
Multi-Granularity Knowledge Sharing and Linguistic Features based Ensemble
Learning [2.8913142991383114]
SemEval 2020 Task-11 aims to design automated systems for news propaganda detection.
Task-11 consists of two sub-tasks, namely, Span Identification and Technique Classification.
arXiv Detail & Related papers (2020-05-31T19:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.