Classifying multilingual party manifestos: Domain transfer across
country, time, and genre
- URL: http://arxiv.org/abs/2307.16511v1
- Date: Mon, 31 Jul 2023 09:16:13 GMT
- Title: Classifying multilingual party manifestos: Domain transfer across
country, time, and genre
- Authors: Matthias A{\ss}enmacher and Nadja Sauter and Christian Heumann
- Abstract summary: We show the potential of domain transfer across geographical locations, languages, time, and genre in a large-scale database of political manifestos.
For switching genres, we use an external corpus of transcribed speeches from New Zealand politicians while for the other three dimensions, custom splits of the Manifesto database are used.
DistilBERT proves to be competitive at a lower computational expense and is thus used for further experiments across time and country.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Annotating costs of large corpora are still one of the main bottlenecks in
empirical social science research. On the one hand, making use of the
capabilities of domain transfer allows re-using annotated data sets and trained
models. On the other hand, it is not clear how well domain transfer works and
how reliable the results are for transfer across different dimensions. We
explore the potential of domain transfer across geographical locations,
languages, time, and genre in a large-scale database of political manifestos.
First, we show the strong within-domain classification performance of
fine-tuned transformer models. Second, we vary the genre of the test set across
the aforementioned dimensions to test for the fine-tuned models' robustness and
transferability. For switching genres, we use an external corpus of transcribed
speeches from New Zealand politicians while for the other three dimensions,
custom splits of the Manifesto database are used. While BERT achieves the best
scores in the initial experiments across modalities, DistilBERT proves to be
competitive at a lower computational expense and is thus used for further
experiments across time and country. The results of the additional analysis
show that (Distil)BERT can be applied to future data with similar performance.
Moreover, we observe (partly) notable differences between the political
manifestos of different countries of origin, even if these countries share a
language or a cultural background.
Related papers
- StylePrompter: Enhancing Domain Generalization with Test-Time Style Priors [39.695604434738186]
In real-world applications, the sample distribution at the inference stage often differs from the one at the training stage.
This paper introduces the style prompt in the language modality to adapt the trained model dynamically.
In particular, we train a style prompter to extract style information of the current image into an embedding in the token embedding space.
Our open space partition of the style token embedding space and the hand-crafted style regularization enable the trained style prompter to handle data from unknown domains effectively.
arXiv Detail & Related papers (2024-08-17T08:35:43Z) - Seed-Guided Fine-Grained Entity Typing in Science and Engineering
Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains.
We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus.
It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z) - Multilingual estimation of political-party positioning: From label
aggregation to long-input Transformers [3.651047982634467]
We implement and compare two approaches to automatic scaling analysis of political-party manifestos.
We find that the task can be efficiently solved by state-of-the-art models, with label aggregation producing the best results.
arXiv Detail & Related papers (2023-10-19T08:34:48Z) - Analyzing the Generalizability of Deep Contextualized Language
Representations For Text Classification [0.0]
This study evaluates the robustness of two state-of-the-art deep contextual language representations, ELMo and DistilBERT.
In the news classification task, the models are developed on local news from India and tested on the local news from China.
In the sentiment analysis task, the models are trained on movie reviews and tested on customer reviews.
arXiv Detail & Related papers (2023-03-22T22:31:09Z) - Cross-domain Sentiment Classification in Spanish [18.563342761346608]
We study the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains.
Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews.
arXiv Detail & Related papers (2023-03-15T23:11:30Z) - Using Language to Extend to Unseen Domains [81.37175826824625]
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.
We consider how simply verbalizing the training domain as well as domains we want to extend to but do not have data for can improve robustness.
Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain.
arXiv Detail & Related papers (2022-10-18T01:14:02Z) - Studying the role of named entities for content preservation in text
style transfer [65.40394342240558]
We focus on the role of named entities in content preservation for formality text style transfer.
We collect a new dataset for the evaluation of content similarity measures in text style transfer.
We perform an error analysis of a pre-trained formality transfer model and introduce a simple technique to use information about named entities to enhance the performance of baseline content similarity measures used in text style transfer.
arXiv Detail & Related papers (2022-06-20T09:31:47Z) - VisDA-2021 Competition Universal Domain Adaptation to Improve
Performance on Out-of-Distribution Data [64.91713686654805]
The Visual Domain Adaptation (VisDA) 2021 competition tests models' ability to adapt to novel test distributions.
We will evaluate adaptation to novel viewpoints, backgrounds, modalities and degradation in quality.
Performance will be measured using a rigorous protocol, comparing to state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-07-23T03:21:51Z) - XL-WiC: A Multilingual Benchmark for Evaluating Semantic
Contextualization [98.61159823343036]
We present the Word-in-Context dataset (WiC) for assessing the ability to correctly model distinct meanings of a word.
We put forward a large multilingual benchmark, XL-WiC, featuring gold standards in 12 new languages.
Experimental results show that even when no tagged instances are available for a target language, models trained solely on the English data can attain competitive performance.
arXiv Detail & Related papers (2020-10-13T15:32:00Z) - A Cross-Genre Ensemble Approach to Robust Reddit Part of Speech Tagging [10.609715843964263]
We study how a state-of-the-art tagging model trained on different genres performs on Web content from unfiltered Reddit forum discussions.
Our results show that even small amounts of in-domain data can outperform the contribution of data from other Web domains.
arXiv Detail & Related papers (2020-04-29T16:36:38Z) - Unsupervised Domain Clusters in Pretrained Language Models [61.832234606157286]
We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision.
We propose domain data selection methods based on such models.
We evaluate our data selection methods for neural machine translation across five diverse domains.
arXiv Detail & Related papers (2020-04-05T06:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.