Automated stance detection in complex topics and small languages: the
challenging case of immigration in polarizing news media
- URL: http://arxiv.org/abs/2305.13047v1
- Date: Mon, 22 May 2023 13:56:35 GMT
- Title: Automated stance detection in complex topics and small languages: the
challenging case of immigration in polarizing news media
- Authors: Mark Mets, Andres Karjus, Indrek Ibrus, Maximilian Schich
- Abstract summary: This paper explores the applicability of large language models for automated stance detection in a challenging scenario.
It involves a morphologically complex, lower-resource language, and a socio-culturally complex topic, immigration.
If the approach works in this case, it can be expected to perform as well or better in less demanding scenarios.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated stance detection and related machine learning methods can provide
useful insights for media monitoring and academic research. Many of these
approaches require annotated training datasets, which limits their
applicability for languages where these may not be readily available. This
paper explores the applicability of large language models for automated stance
detection in a challenging scenario, involving a morphologically complex,
lower-resource language, and a socio-culturally complex topic, immigration. If
the approach works in this case, it can be expected to perform as well or
better in less demanding scenarios. We annotate a large set of pro and
anti-immigration examples, and compare the performance of multiple language
models as supervised learners. We also probe the usability of ChatGPT as an
instructable zero-shot classifier for the same task. Supervised achieves
acceptable performance, and ChatGPT yields similar accuracy. This is promising
as a potentially simpler and cheaper alternative for text classification tasks,
including in lower-resource languages. We further use the best-performing model
to investigate diachronic trends over seven years in two corpora of Estonian
mainstream and right-wing populist news sources, demonstrating the
applicability of the approach for news analytics and media monitoring settings,
and discuss correspondences between stance changes and real-world events.
Related papers
- MENTOR: Multilingual tExt detectioN TOward leaRning by analogy [59.37382045577384]
We propose a framework to detect and identify both seen and unseen language regions inside scene images.
"MENTOR" is the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection.
arXiv Detail & Related papers (2024-03-12T03:35:17Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Controlling Extra-Textual Attributes about Dialogue Participants: A Case
Study of English-to-Polish Neural Machine Translation [4.348327991071386]
Machine translation models need to opt for a certain interpretation of textual context when translating from English to Polish.
We propose a case study where a wide range of approaches for controlling attributes in translation is employed.
The best model achieves an improvement of +5.81 chrF++/+6.03 BLEU, with other models achieving competitive performance.
arXiv Detail & Related papers (2022-05-10T08:45:39Z) - Towards Best Practices for Training Multilingual Dense Retrieval Models [54.91016739123398]
We focus on the task of monolingual retrieval in a variety of typologically diverse languages using one such design.
Our study is organized as a "best practices" guide for training multilingual dense retrieval models.
arXiv Detail & Related papers (2022-04-05T17:12:53Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Few-Shot Cross-Lingual Stance Detection with Sentiment-Based
Pre-Training [32.800766653254634]
We present the most comprehensive study of cross-lingual stance detection to date.
We use 15 diverse datasets in 12 languages from 6 language families.
For our experiments, we build on pattern-exploiting training, proposing the addition of a novel label encoder.
arXiv Detail & Related papers (2021-09-13T15:20:06Z) - Semi-automatic Generation of Multilingual Datasets for Stance Detection
in Twitter [9.359018642178917]
This paper presents a method to obtain multilingual datasets for stance detection in Twitter.
We leverage user-based information to semi-automatically label large amounts of tweets.
arXiv Detail & Related papers (2021-01-28T13:05:09Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.