Translating the Grievance Dictionary: a psychometric evaluation of Dutch, German, and Italian versions
- URL: http://arxiv.org/abs/2505.07495v1
- Date: Mon, 12 May 2025 12:27:38 GMT
- Title: Translating the Grievance Dictionary: a psychometric evaluation of Dutch, German, and Italian versions
- Authors: Isabelle van der Vegt, Bennett Kleinberg, Marilu Miotto, Jonas Festor,
- Abstract summary: Grievance Dictionary is a psycholinguistic dictionary for the analysis of violent, threatening or grievance-fuelled texts.<n>Considering the relevance of these themes in languages beyond English, we translated the Grievance Dictionary to Dutch, German, and Italian.<n>The Dutch and German translations perform similarly to the original English version, whereas the Italian dictionary shows low reliability for some categories.
- Score: 0.3399874096487746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper introduces and evaluates three translations of the Grievance Dictionary, a psycholinguistic dictionary for the analysis of violent, threatening or grievance-fuelled texts. Considering the relevance of these themes in languages beyond English, we translated the Grievance Dictionary to Dutch, German, and Italian. We describe the process of automated translation supplemented by human annotation. Psychometric analyses are performed, including internal reliability of dictionary categories and correlations with the LIWC dictionary. The Dutch and German translations perform similarly to the original English version, whereas the Italian dictionary shows low reliability for some categories. Finally, we make suggestions for further validation and application of the dictionary, as well as for future dictionary translations following a similar approach.
Related papers
- Inaccuracy of an E-Dictionary and Its Influence on Chinese Language Users [4.061449824145836]
The accuracy of major E-dictionaries is seldom scrutinized, and little attention has been paid to how their corpora are constructed.<n>This study adopts a combined method of experimentation, user survey, and dictionary critique to examine Youdao, one of the most widely used E-dictionaries in China.<n>Results show that incomplete or misleading definitions can cause serious misunderstandings.<n>The study further explores how such flawed definitions originate, highlighting issues in data processing and the integration of AI and machine learning technologies in dictionary construction.
arXiv Detail & Related papers (2025-04-01T13:54:33Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [56.7988577327046]
We introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company.<n>Our findings highlight the potential of multi-agent collaboration in enhancing translation quality, particularly for longer texts.
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Sõnajaht: Definition Embeddings and Semantic Search for Reverse Dictionary Creation [0.21485350418225246]
We present an information retrieval based reverse dictionary system using modern pre-trained language models and approximate nearest neighbors search algorithms.
The proposed approach is applied to an existing Estonian language lexicon resource, Sonaveeb (word web), with the purpose of enhancing and enriching it by introducing cross-lingual reverse dictionary functionality powered by semantic search.
arXiv Detail & Related papers (2024-04-30T10:21:14Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - Quantifying Synthesis and Fusion and their Impact on Machine Translation [79.61874492642691]
In Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative.
In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level.
For computing literature, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study.
Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish,
arXiv Detail & Related papers (2022-05-06T17:04:58Z) - Quantitative Evaluation of Alternative Translations in a Corpus of
Highly Dissimilar Finnish Paraphrases [1.8748036062767652]
We present a quantitative evaluation of differences between alternative translations in a large recently released Finnish paraphrase corpus.
We combine a series of automatic steps detecting systematic variation with manual analysis to reveal regularities and identify categories of translation differences.
arXiv Detail & Related papers (2021-05-06T07:22:16Z) - Facilitating Terminology Translation with Target Lemma Annotations [4.492630871726495]
We train machine translation systems using a source-side data augmentation method that annotates randomly selected source language words with their target language lemmas.
Experiments on terminology translation into the morphologically complex Baltic and Uralic languages show an improvement of up to 7 BLEU points over baseline systems.
Results of the human evaluation indicate a 47.7% absolute improvement over the previous work in term translation accuracy when translating into Latvian.
arXiv Detail & Related papers (2021-01-25T12:07:20Z) - The Grievance Dictionary: Understanding Threatening Language Use [0.8373151777137792]
The Grievance Dictionary can be used to automatically understand language use in the context of grievance-fuelled violence threat assessment.
The dictionary was validated by applying it to texts written by violent and non-violent individuals.
arXiv Detail & Related papers (2020-09-10T12:06:48Z) - Multilingual Alignment of Contextual Word Representations [49.42244463346612]
BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model.
We introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer.
These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.
arXiv Detail & Related papers (2020-02-10T03:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.