IsraParlTweet: The Israeli Parliamentary and Twitter Resource
- URL: http://arxiv.org/abs/2405.20269v1
- Date: Thu, 30 May 2024 17:21:15 GMT
- Title: IsraParlTweet: The Israeli Parliamentary and Twitter Resource
- Authors: Guy Mor-Lan, Effi Levi, Tamir Sheafer, Shaul R. Shenhav,
- Abstract summary: We introduce IsraParlTweet, a new linked corpus of Hebrew-language parliamentary discussions from the Knesset (Israeli Parliament) between the years 1992-2023 and Twitter posts made by Members of the Knesset between the years 2008-2023.
IsraParlTweet can be used to conduct a wide variety of quantitative and qualitative analyses and provide valuable insights into political discourse in Israel.
- Score: 2.249916681499244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce IsraParlTweet, a new linked corpus of Hebrew-language parliamentary discussions from the Knesset (Israeli Parliament) between the years 1992-2023 and Twitter posts made by Members of the Knesset between the years 2008-2023, containing a total of 294.5 million Hebrew tokens. In addition to raw text, the corpus contains comprehensive metadata on speakers and Knesset sessions as well as several linguistic annotations. As a result, IsraParlTweet can be used to conduct a wide variety of quantitative and qualitative analyses and provide valuable insights into political discourse in Israel.
Related papers
- SpeakGer: A meta-data enriched speech corpus of German state and federal parliaments [0.12277343096128711]
We provide the SpeakGer data set, consisting of German parliament debates from all 16 federal states of Germany as well as the German Bundestag from 1947-2023.
This data set includes rich meta data in form of information on both reactions from the audience towards the speech as well as information about the speaker's party, their age, their constituency and their party's political alignment.
arXiv Detail & Related papers (2024-10-23T14:00:48Z) - The Knesset Corpus: An Annotated Corpus of Hebrew Parliamentary Proceedings [3.2405928866433067]
We present the Corpus Knesset, a corpus of Hebrew parliamentary proceedings from 1998 to 2022.
We show that the corpus can be used to examine historical developments in the style of political discussions.
We also investigate some differences between the styles of men and women speakers.
arXiv Detail & Related papers (2024-05-28T12:23:39Z) - MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank [56.810282574817414]
We present the first multi-dialect Bavarian treebank (MaiBaam) manually annotated with part-of-speech and syntactic dependency information in Universal Dependencies (UD)
We highlight the morphosyntactic differences between the closely-related Bavarian and German and showcase the rich variability of speakers' orthographies.
Our corpus includes 15k tokens, covering dialects from all Bavarian-speaking areas spanning three countries.
arXiv Detail & Related papers (2024-03-15T13:33:10Z) - Multi-EuP: The Multilingual European Parliament Dataset for Analysis of
Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context.
It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages.
It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Speaker attribution in German parliamentary debates with QLoRA-adapted
large language models [0.0]
We study the potential of the large language model family Llama 2 to automate speaker attribution in German parliamentary debates from 2017-2021.
Our results shed light on the capabilities of large language models in automating speaker attribution, revealing a promising avenue for computational analysis of political discourse and the development of semantic role labeling systems.
arXiv Detail & Related papers (2023-09-18T16:06:16Z) - A Second Wave of UD Hebrew Treebanking and Cross-Domain Parsing [8.373151777137792]
This paper presents a new, freely available UD treebank of Hebrew from a range of topics selected from Hebrew Wikipedia.
In addition to introducing the corpus and evaluating the quality of its annotations, we deploy automatic validation tools based on grew.
We obtain new state-of-the-art (SOTA) results on UD NLP tasks, using a combination of the latest language modelling and some incremental improvements to existing transformer based approaches.
arXiv Detail & Related papers (2022-10-14T14:52:07Z) - BasqueParl: A Bilingual Corpus of Basque Parliamentary Transcriptions [3.4447242282168777]
We release the first version of a newly compiled corpus from Basque parliamentary transcripts.
The corpus is characterized by heavy Basque-Spanish code-switching, and represents an interesting resource to study political discourse in contrasting languages such as Basque and Spanish.
arXiv Detail & Related papers (2022-05-03T14:02:24Z) - German Parliamentary Corpus (GerParCor) [63.17616047204443]
We introduce the German Parliament Corpus (GerParCor)
GerParCor is a genre-specific corpus of German-language parliamentary protocols from three centuries and four countries.
All protocols were preprocessed by means of the NLP pipeline of spaCy3 and automatically annotated with metadata regarding their session date.
arXiv Detail & Related papers (2022-04-21T22:06:55Z) - A Novel Corpus of Discourse Structure in Humans and Computers [55.74664144248097]
We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses.
The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2.
arXiv Detail & Related papers (2021-11-10T20:56:08Z) - ESPnet-ST IWSLT 2021 Offline Speech Translation System [56.83606198051871]
This paper describes the ESPnet-ST group's IWSLT 2021 submission in the offline speech translation track.
This year we made various efforts on training data, architecture, and audio segmentation.
Our best E2E system combined all the techniques with model ensembling and achieved 31.4 BLEU.
arXiv Detail & Related papers (2021-07-01T17:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.