Related papers: RooseBERT: A New Deal For Political Language Modelling

RooseBERT: A New Deal For Political Language Modelling

URL: http://arxiv.org/abs/2508.03250v1
Date: Tue, 05 Aug 2025 09:28:20 GMT
Title: RooseBERT: A New Deal For Political Language Modelling
Authors: Deborah Dore, Elena Cabrio, Serena Villata,
Abstract summary: RooseBERT is a pre-trained Language Model for political discourse language.<n>It has been trained on large political debate and speech corpora.
Score: 18.442235469997232
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The increasing amount of political debates and politics-related discussions calls for the definition of novel computational methods to automatically analyse such content with the final goal of lightening up political deliberation to citizens. However, the specificity of the political language and the argumentative form of these debates (employing hidden communication strategies and leveraging implicit arguments) make this task very challenging, even for current general-purpose pre-trained Language Models. To address this issue, we introduce a novel pre-trained Language Model for political discourse language called RooseBERT. Pre-training a language model on a specialised domain presents different technical and linguistic challenges, requiring extensive computational resources and large-scale data. RooseBERT has been trained on large political debate and speech corpora (8K debates, each composed of several sub-debates on different topics) in English. To evaluate its performances, we fine-tuned it on four downstream tasks related to political debate analysis, i.e., named entity recognition, sentiment analysis, argument component detection and classification, and argument relation prediction and classification. Our results demonstrate significant improvements over general-purpose Language Models on these four tasks, highlighting how domain-specific pre-training enhances performance in political debate analysis. We release the RooseBERT language model for the research community.

Related papers

Language-Dependent Political Bias in AI: A Study of ChatGPT and Gemini [0.0]
This study investigates the political tendency of large language models and the existence of differentiation according to the query language.<n>ChatGPT and Gemini were subjected to a political axis test using 14 different languages.<n>A comparative analysis revealed that Gemini exhibited a more pronounced liberal and left-wing tendency compared to ChatGPT.
arXiv Detail & Related papers (2025-04-08T21:13:01Z)
AgoraSpeech: A multi-annotated comprehensive dataset of political discourse through the lens of humans and AI [1.3060410279656598]
AgoraSpeech is a meticulously curated, high-quality dataset of 171 political speeches from six parties during the Greek national elections in 2023.<n>The dataset includes annotations (per paragraph) for six natural language processing (NLP) tasks: text classification, topic identification, sentiment analysis, named entity recognition, polarization and populism detection.
arXiv Detail & Related papers (2025-01-09T18:17:59Z)
"I Never Said That": A dataset, taxonomy and baselines on response clarity classification [4.16330182801919]
We introduce a novel taxonomy that frames the task of detecting and classifying response clarity. Our proposed two-level taxonomy addresses the clarity of a response in terms of the information provided for a given question. We combine ChatGPT and human annotators to collect, validate and annotate discrete QA pairs from political interviews.
arXiv Detail & Related papers (2024-09-20T20:15:06Z)
Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies. We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z)
Modelling Political Coalition Negotiations Using LLM-based Agents [53.934372246390495]
We introduce coalition negotiations as a novel NLP task, and model it as a negotiation between large language model-based agents. We introduce a multilingual dataset, POLCA, comprising manifestos of European political parties and coalition agreements over a number of elections in these countries. We propose a hierarchical Markov decision process designed to simulate the process of coalition negotiation between political parties and predict the outcomes.
arXiv Detail & Related papers (2024-02-18T21:28:06Z)
"We Demand Justice!": Towards Social Context Grounding of Political Texts [19.58924256275583]
Social media discourse frequently consists of'seemingly similar language used by opposing sides of the political spectrum' This paper defines the context required to fully understand such ambiguous statements in a computational setting. We propose two challenging datasets that require an understanding of the real-world context of the text.
arXiv Detail & Related papers (2023-11-15T16:53:35Z)
Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes [17.489075240435348]
Recent conditional language models are able to continue any kind of text source in an often seemingly fluent way. From a linguistic perspective, contributing to a conversation is high. Recent approaches try to tame the underlying language models at various intervention points.
arXiv Detail & Related papers (2023-08-11T12:07:45Z)
Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z)
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents [70.08842857515141]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.<n>Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z)
DiaASQ : A Benchmark of Conversational Aspect-based Sentiment Quadruple Analysis [84.80347062834517]
We introduce DiaASQ, aiming to detect the quadruple of target-aspect-opinion-sentiment in a dialogue. We manually construct a large-scale high-quality DiaASQ dataset in both Chinese and English languages. We develop a neural model to benchmark the task, which advances in effectively performing end-to-end quadruple prediction.
arXiv Detail & Related papers (2022-11-10T17:18:20Z)
An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z)
Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing. Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.