Multi-aspect Multilingual and Cross-lingual Parliamentary Speech
Analysis
- URL: http://arxiv.org/abs/2207.01054v2
- Date: Tue, 20 Jun 2023 13:32:02 GMT
- Title: Multi-aspect Multilingual and Cross-lingual Parliamentary Speech
Analysis
- Authors: Kristian Miok, Encarnacion Hidalgo-Tenorio, Petya Osenova,
Miguel-Angel Benitez-Castro and Marko Robnik-Sikonja
- Abstract summary: We apply advanced NLP methods to a joint and comparative analysis of six national parliaments between 2017 and 2020.
We analyze emotions and sentiment in the transcripts from the ParlaMint dataset collection.
The results show some commonalities and many surprising differences among the analyzed countries.
- Score: 1.759288298635146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parliamentary and legislative debate transcripts provide informative insight
into elected politicians' opinions, positions, and policy preferences. They are
interesting for political and social sciences as well as linguistics and
natural language processing (NLP) research. While existing research studied
individual parliaments, we apply advanced NLP methods to a joint and
comparative analysis of six national parliaments (Bulgarian, Czech, French,
Slovene, Spanish, and United Kingdom) between 2017 and 2020. We analyze
emotions and sentiment in the transcripts from the ParlaMint dataset collection
and assess if the age, gender, and political orientation of speakers can be
detected from their speeches. The results show some commonalities and many
surprising differences among the analyzed countries.
Related papers
- SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments [0.12277343096128711]
We provide the SpeakGer data set, consisting of German parliament debates from all 16 federal states of Germany as well as the German Bundestag from 1947-2023.
This data set includes rich meta data in form of information on both reactions from the audience towards the speech as well as information about the speaker's party, their age, their constituency and their party's political alignment.
arXiv Detail & Related papers (2024-10-23T14:00:48Z) - Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.
Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.
We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z) - L(u)PIN: LLM-based Political Ideology Nowcasting [1.124958340749622]
We present a method to analyze ideological positions of individual parliamentary representatives by leveraging the latent knowledge of LLMs.
The method allows us to evaluate the stance of politicians on an axis of our choice allowing us to flexibly measure the stance of politicians in regards to a topic/controversy of our choice.
arXiv Detail & Related papers (2024-05-12T16:14:07Z) - Llama meets EU: Investigating the European Political Spectrum through the Lens of LLMs [18.836470390824633]
We audit Llama Chat in the context of EU politics to analyze the model's political knowledge and its ability to reason in context.
We adapt, i.e., further fine-tune, Llama Chat on speeches of individual euro-parties from debates in the European Parliament to reevaluate its political leaning.
arXiv Detail & Related papers (2024-03-20T13:42:57Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - Multi-EuP: The Multilingual European Parliament Dataset for Analysis of
Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context.
It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages.
It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z) - Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z) - The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings [0.0]
The paper presents a new training dataset of sentences in 7 languages, manually annotated for sentiment.
The paper additionally introduces the first domain-specific multilingual transformer language model for political science applications.
arXiv Detail & Related papers (2023-09-18T14:01:06Z) - BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions [3.4447242282168777]
We release the first version of a newly compiled corpus from Basque parliamentary transcripts.
The corpus is characterized by heavy Basque-Spanish code-switching, and represents an interesting resource to study political discourse in contrasting languages such as Basque and Spanish.
arXiv Detail & Related papers (2022-05-03T14:02:24Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.