Political claim identification and categorization in a multilingual
setting: First experiments
- URL: http://arxiv.org/abs/2310.09256v1
- Date: Fri, 13 Oct 2023 17:13:00 GMT
- Title: Political claim identification and categorization in a multilingual
setting: First experiments
- Authors: Urs Zaberer and Sebastian Pad\'o and Gabriella Lapesa
- Abstract summary: This paper explores different strategies for the cross-lingual projection of political claims analysis.
We conduct experiments on a German dataset, DebateNet2.0, covering the policy debate sparked by the 2015 refugee crisis.
- Score: 11.124149798170908
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The identification and classification of political claims is an important
step in the analysis of political newspaper reports; however, resources for
this task are few and far between. This paper explores different strategies for
the cross-lingual projection of political claims analysis. We conduct
experiments on a German dataset, DebateNet2.0, covering the policy debate
sparked by the 2015 refugee crisis. Our evaluation involves two tasks (claim
identification and categorization), three languages (German, English, and
French) and two methods (machine translation -- the best method in our
experiments -- and multilingual embeddings).
Related papers
- Multi-EuP: The Multilingual European Parliament Dataset for Analysis of
Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context.
It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages.
It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z) - The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings [0.0]
The paper presents a new training dataset of sentences in 7 languages, manually annotated for sentiment.
The paper additionally introduces the first domain-specific multilingual transformer language model for political science applications.
arXiv Detail & Related papers (2023-09-18T14:01:06Z) - Towards Best Practices for Training Multilingual Dense Retrieval Models [54.91016739123398]
We focus on the task of monolingual retrieval in a variety of typologically diverse languages using one such design.
Our study is organized as a "best practices" guide for training multilingual dense retrieval models.
arXiv Detail & Related papers (2022-04-05T17:12:53Z) - Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language.
The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German.
We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z) - Between welcome culture and border fence. A dataset on the European
refugee crisis in German newspaper reports [12.752057567551015]
This paper introduces DebateNet2.0, which traces the political discourse on the European refugee crisis in the German quality newspaper taz during the year 2015.
The core units of our annotation are political claims (requests for specific actions to be taken within the policy field) and the actors who make them (politicians, parties, etc.)
arXiv Detail & Related papers (2021-11-19T10:34:23Z) - Deriving Disinformation Insights from Geolocalized Twitter Callouts [7.951685935253415]
This paper demonstrates a two-stage method for deriving insights from social media data relating to disinformation by applying a combination of geospatial classification and embedding-based language modelling.
Twitter data is classified into European and non-European sets using BERT.
Word2vec is applied to the classified texts resulting in Eurocentric, non-Eurocentric and global representations of the data for the three target languages.
arXiv Detail & Related papers (2021-08-06T11:39:05Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - A novel approach to sentiment analysis in Persian using discourse and
external semantic information [0.0]
Many approaches have been proposed to extract the sentiment of individuals from documents written in natural languages.
The majority of these approaches have focused on English, while resource-lean languages such as Persian suffer from the lack of research work and language resources.
Due to this gap in Persian, the current work is accomplished to introduce new methods for sentiment analysis which have been applied on Persian.
arXiv Detail & Related papers (2020-07-18T18:40:40Z) - A Call for More Rigor in Unsupervised Cross-lingual Learning [76.6545568416577]
An existing rationale for such research is based on the lack of parallel data for many of the world's languages.
We argue that a scenario without any parallel data and abundant monolingual data is unrealistic in practice.
arXiv Detail & Related papers (2020-04-30T17:06:23Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.