Related papers: A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis

A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis

URL: http://arxiv.org/abs/2210.12883v1
Date: Sun, 23 Oct 2022 23:23:28 GMT
Title: A Greek Parliament Proceedings Dataset for Computational Linguistics and Political Analysis
Authors: Konstantina Dritsa, Kaiti Thoma, John Pavlopoulos, Panos Louridas
Abstract summary: We introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files.
Score: 4.396860522241306
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large, diachronic datasets of political discourse are hard to come across, especially for resource-lean languages such as Greek. In this paper, we introduce a curated dataset of the Greek Parliament Proceedings that extends chronologically from 1989 up to 2020. It consists of more than 1 million speeches with extensive metadata, extracted from 5,355 parliamentary record files. We explain how it was constructed and the challenges that we had to overcome. The dataset can be used for both computational linguistics and political analysis-ideally, combining the two. We present such an application, showing (i) how the dataset can be used to study the change of word usage through time, (ii) between significant historical events and political parties, (iii) by evaluating and employing algorithms for detecting semantic shifts.

Related papers

PSCon: Product Search Through Conversations [55.94925947614474]
Conversational Product Search ( CPS) systems interact with users via natural language to offer personalized and context-aware product lists. Most existing research on CPS is limited to simulated conversations, due to the lack of a real CPS dataset driven by human-like language. In this paper, we propose a CPS data collection protocol and create a new CPS dataset, called PSCon, which assists product search through conversations with human-like language.
arXiv Detail & Related papers (2025-02-19T17:05:42Z)
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI [1.3060410279656598]
AgoraSpeech is a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023. The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection.
arXiv Detail & Related papers (2025-01-09T18:17:59Z)
Political-LLM: Large Language Models in Political Science [159.95299889946637]
Large language models (LLMs) have been widely adopted in political science tasks. Political-LLM aims to advance the comprehensive understanding of integrating LLMs into computational political science.
arXiv Detail & Related papers (2024-12-09T08:47:50Z)
SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments [0.12277343096128711]
We provide the SpeakGer data set, consisting of German parliament debates from all 16 federal states of Germany as well as the German Bundestag from 1947-2023. This data set includes rich meta data in form of information on both reactions from the audience towards the speech as well as information about the speaker's party, their age, their constituency and their party's political alignment.
arXiv Detail & Related papers (2024-10-23T14:00:48Z)
Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies. We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z)
PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin [11.820097994590672]
We introduce the Proto-Italic to Latin dataset, which consists of roughly 3,000 pairs of forms from Proto-Italic and Latin. We present baseline results for PILA on a pair of traditional computational historical linguistics tasks. We demonstrate PILA's capability for enhancing other historical-linguistic datasets.
arXiv Detail & Related papers (2024-04-25T05:33:47Z)
Syntactic Language Change in English and German: Metrics, Parsers, and Convergences [56.47832275431858]
The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependencys, including the widely used Stanford Core as well as 4 newer alternatives. We show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions.
arXiv Detail & Related papers (2024-02-18T11:46:16Z)
Multilingual estimation of political-party positioning: From label aggregation to long-input Transformers [3.651047982634467]
We implement and compare two approaches to automatic scaling analysis of political-party manifestos. We find that the task can be efficiently solved by state-of-the-art models, with label aggregation producing the best results.
arXiv Detail & Related papers (2023-10-19T08:34:48Z)
The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings [0.0]
The paper presents a new training dataset of sentences in 7 languages, manually annotated for sentiment. The paper additionally introduces the first domain-specific multilingual transformer language model for political science applications.
arXiv Detail & Related papers (2023-09-18T14:01:06Z)
Panning for gold: Lessons learned from the platform-agnostic automated detection of political content in textual data [48.7576911714538]
We discuss how these techniques can be used to detect political content across different platforms. We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks. Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
arXiv Detail & Related papers (2022-07-01T15:23:23Z)
BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions [3.4447242282168777]
We release the first version of a newly compiled corpus from Basque parliamentary transcripts. The corpus is characterized by heavy Basque-Spanish code-switching, and represents an interesting resource to study political discourse in contrasting languages such as Basque and Spanish.
arXiv Detail & Related papers (2022-05-03T14:02:24Z)
Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language. The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German. We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z)
A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space. We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance. We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z)
A Corpus for Large-Scale Phonetic Typology [112.19288631037055]
We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology. aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants.
arXiv Detail & Related papers (2020-05-28T13:03:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.