LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification
- URL: http://arxiv.org/abs/2008.04820v2
- Date: Thu, 20 Aug 2020 15:14:18 GMT
- Title: LTIatCMU at SemEval-2020 Task 11: Incorporating Multi-Level Features for
Multi-Granular Propaganda Span Identification
- Authors: Sopan Khosla, Rishabh Joshi, Ritam Dutt, Alan W Black, Yulia Tsvetkov
- Abstract summary: This paper describes our submission for the task of Propaganda Span Identification in news articles.
We introduce a BERT-BiLSTM based span-level propaganda classification model that identifies which token spans within the sentence are indicative of propaganda.
- Score: 70.1903083747775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper we describe our submission for the task of Propaganda Span
Identification in news articles. We introduce a BERT-BiLSTM based span-level
propaganda classification model that identifies which token spans within the
sentence are indicative of propaganda. The "multi-granular" model incorporates
linguistic knowledge at various levels of text granularity, including word,
sentence and document level syntactic, semantic and pragmatic affect features,
which significantly improve model performance, compared to its
language-agnostic variant. To facilitate better representation learning, we
also collect a corpus of 10k news articles, and use it for fine-tuning the
model. The final model is a majority-voting ensemble which learns different
propaganda class boundaries by leveraging different subsets of incorporated
knowledge and attains $4^{th}$ position on the test leaderboard. Our final
model and code is released at https://github.com/sopu/PropagandaSemEval2020.
Related papers
- MemeMind at ArAIEval Shared Task: Spotting Persuasive Spans in Arabic Text with Persuasion Techniques Identification [0.10120650818458249]
This paper focuses on detecting propagandistic spans and persuasion techniques in Arabic text from tweets and news paragraphs.
Our approach achieved an F1 score of 0.2774, securing the 3rd position in the leaderboard of Task 1.
arXiv Detail & Related papers (2024-08-08T15:49:01Z) - Exposing propaganda: an analysis of stylistic cues comparing human
annotations and machine classification [0.7749297275724032]
This paper investigates the language of propaganda and its stylistic features.
It presents the PPN dataset, composed of news articles extracted from websites identified as propaganda sources.
We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification.
arXiv Detail & Related papers (2024-02-06T07:51:54Z) - HuBERTopic: Enhancing Semantic Representation of HuBERT through
Self-supervision Utilizing Topic Model [62.995175485416]
We propose a new approach to enrich the semantic representation of HuBERT.
An auxiliary topic classification task is added to HuBERT by using topic labels as teachers.
Experimental results demonstrate that our method achieves comparable or better performance than the baseline in most tasks.
arXiv Detail & Related papers (2023-10-06T02:19:09Z) - Hierarchical Multi-Instance Multi-Label Learning for Detecting
Propaganda Techniques [12.483639681339767]
We propose a simple RoBERTa-based model for classifying all spans in an article simultaneously.
We incorporate hierarchical label dependencies by adding an auxiliary classifier for each node in the decision tree.
Our model leads to an absolute improvement of 2.47% micro-F1 over the model from the shared task winning team in a cross-validation setup.
arXiv Detail & Related papers (2023-05-30T21:23:19Z) - Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge
Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding.
We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z) - BPGC at SemEval-2020 Task 11: Propaganda Detection in News Articles with
Multi-Granularity Knowledge Sharing and Linguistic Features based Ensemble
Learning [2.8913142991383114]
SemEval 2020 Task-11 aims to design automated systems for news propaganda detection.
Task-11 consists of two sub-tasks, namely, Span Identification and Technique Classification.
arXiv Detail & Related papers (2020-05-31T19:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.