Multilingual estimation of political-party positioning: From label
aggregation to long-input Transformers
- URL: http://arxiv.org/abs/2310.12575v1
- Date: Thu, 19 Oct 2023 08:34:48 GMT
- Title: Multilingual estimation of political-party positioning: From label
aggregation to long-input Transformers
- Authors: Dmitry Nikolaev and Tanise Ceron and Sebastian Pad\'o
- Abstract summary: We implement and compare two approaches to automatic scaling analysis of political-party manifestos.
We find that the task can be efficiently solved by state-of-the-art models, with label aggregation producing the best results.
- Score: 3.651047982634467
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Scaling analysis is a technique in computational political science that
assigns a political actor (e.g. politician or party) a score on a predefined
scale based on a (typically long) body of text (e.g. a parliamentary speech or
an election manifesto). For example, political scientists have often used the
left--right scale to systematically analyse political landscapes of different
countries. NLP methods for automatic scaling analysis can find broad
application provided they (i) are able to deal with long texts and (ii) work
robustly across domains and languages. In this work, we implement and compare
two approaches to automatic scaling analysis of political-party manifestos:
label aggregation, a pipeline strategy relying on annotations of individual
statements from the manifestos, and long-input-Transformer-based models, which
compute scaling values directly from raw text. We carry out the analysis of the
Comparative Manifestos Project dataset across 41 countries and 27 languages and
find that the task can be efficiently solved by state-of-the-art models, with
label aggregation producing the best results.
Related papers
- Paired Completion: Flexible Quantification of Issue-framing at Scale with LLMs [0.41436032949434404]
We develop and rigorously evaluate new detection methods for issue framing and narrative analysis within large text datasets.
We show that issue framing can be reliably and efficiently detected in large corpora with only a few examples of either perspective on a given issue.
arXiv Detail & Related papers (2024-08-19T07:14:15Z) - Representation Bias in Political Sample Simulations with Large Language Models [54.48283690603358]
This study seeks to identify and quantify biases in simulating political samples with Large Language Models.
Using the GPT-3.5-Turbo model, we leverage data from the American National Election Studies, German Longitudinal Election Study, Zuobiao dataset, and China Family Panel Studies.
arXiv Detail & Related papers (2024-07-16T05:52:26Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Classifying multilingual party manifestos: Domain transfer across
country, time, and genre [0.0]
We show the potential of domain transfer across geographical locations, languages, time, and genre in a large-scale database of political manifestos.
For switching genres, we use an external corpus of transcribed speeches from New Zealand politicians while for the other three dimensions, custom splits of the Manifesto database are used.
DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country.
arXiv Detail & Related papers (2023-07-31T09:16:13Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - A Greek Parliament Proceedings Dataset for Computational Linguistics and
Political Analysis [4.396860522241306]
We introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020.
It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files.
arXiv Detail & Related papers (2022-10-23T23:23:28Z) - Optimizing text representations to capture (dis)similarity between
political parties [1.2891210250935146]
We look at the problem of modeling pairwise similarities between political parties.
Our research question is what level of structural information is necessary to create robust text representation.
We evaluate our models on the manifestos of German parties for the 2021 federal election.
arXiv Detail & Related papers (2022-10-21T14:24:57Z) - Electoral Programs of German Parties 2021: A Computational Analysis Of
Their Comprehensibility and Likeability Based On SentiArt [0.0]
We analyze the electoral programs of six German parties issued before the parliamentary elections of 2021.
Using novel indices of the readability and emotion potential of texts computed via SentiArt, our data shed light on the similarities and differences of the programs.
They reveal that the programs of the SPD and CDU have the best chances to be comprehensible and likeable.
arXiv Detail & Related papers (2021-09-26T05:27:14Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.