The Grievance Dictionary: Understanding Threatening Language Use
- URL: http://arxiv.org/abs/2009.04798v1
- Date: Thu, 10 Sep 2020 12:06:48 GMT
- Title: The Grievance Dictionary: Understanding Threatening Language Use
- Authors: Isabelle van der Vegt, Maximilian Mozes, Bennett Kleinberg, Paul Gill
- Abstract summary: The Grievance Dictionary can be used to automatically understand language use in the context of grievance-fuelled violence threat assessment.
The dictionary was validated by applying it to texts written by violent and non-violent individuals.
- Score: 0.8373151777137792
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces the Grievance Dictionary, a psycholinguistic dictionary
which can be used to automatically understand language use in the context of
grievance-fuelled violence threat assessment. We describe the development the
dictionary, which was informed by suggestions from experienced threat
assessment practitioners. These suggestions and subsequent human and
computational word list generation resulted in a dictionary of 20,502 words
annotated by 2,318 participants. The dictionary was validated by applying it to
texts written by violent and non-violent individuals, showing strong evidence
for a difference between populations in several dictionary categories. Further
classification tasks showed promising performance, but future improvements are
still needed. Finally, we provide instructions and suggestions for the use of
the Grievance Dictionary by security professionals and (violence) researchers.
Related papers
- Translating the Grievance Dictionary: a psychometric evaluation of Dutch, German, and Italian versions [0.3399874096487746]
Grievance Dictionary is a psycholinguistic dictionary for the analysis of violent, threatening or grievance-fuelled texts.<n>Considering the relevance of these themes in languages beyond English, we translated the Grievance Dictionary to Dutch, German, and Italian.<n>The Dutch and German translations perform similarly to the original English version, whereas the Italian dictionary shows low reliability for some categories.
arXiv Detail & Related papers (2025-05-12T12:27:38Z) - Inaccuracy of an E-Dictionary and Its Influence on Chinese Language Users [4.061449824145836]
The accuracy of major E-dictionaries is seldom scrutinized, and little attention has been paid to how their corpora are constructed.
This study adopts a combined method of experimentation, user survey, and dictionary critique to examine Youdao, one of the most widely used E-dictionaries in China.
Results show that incomplete or misleading definitions can cause serious misunderstandings.
The study further explores how such flawed definitions originate, highlighting issues in data processing and the integration of AI and machine learning technologies in dictionary construction.
arXiv Detail & Related papers (2025-04-01T13:54:33Z) - Bridging Dictionary: AI-Generated Dictionary of Partisan Language Use [21.15400893251543]
Bridging Dictionary is an interactive tool designed to illuminate how words are perceived by people with different political views.
The Bridging Dictionary includes a static, printable document featuring 796 terms with summaries generated by a large language model.
Users can explore selected words, visualizing their frequency, sentiment, summaries, and examples across political divides.
arXiv Detail & Related papers (2024-07-12T19:44:40Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram [5.410785987233275]
We used a dictionary built from biomedical terminology to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once.
A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives.
OpenAI's GPT series models were compared against human annotation.
arXiv Detail & Related papers (2024-05-14T17:27:59Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - A Study of Slang Representation Methods [3.511369967593153]
We study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding.
Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements.
arXiv Detail & Related papers (2022-12-11T21:56:44Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Dynamically Refined Regularization for Improving Cross-corpora Hate
Speech Detection [30.462596705180534]
Hate speech classifiers exhibit substantial performance degradation when evaluated on datasets different from the source.
Previous work has attempted to mitigate this problem by regularizing specific terms from pre-defined static dictionaries.
We propose to automatically identify and reduce spurious correlations using attribution methods with dynamic refinement of the list of terms.
arXiv Detail & Related papers (2022-03-23T16:58:10Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Self-Supervised Euphemism Detection and Identification for Content
Moderation [16.322965299627974]
One common use of euphemisms is to evade content moderation policies enforced by social media platforms.
It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is.
This paper will demonstrate unsupervised algorithms that can both detect words being used euphemistically, and identify the secret meaning of each word.
arXiv Detail & Related papers (2021-03-31T04:52:38Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - RUSSE'2020: Findings of the First Taxonomy Enrichment Task for the
Russian language [70.27072729280528]
This paper describes the results of the first shared task on taxonomy enrichment for the Russian language.
16 teams participated in the task demonstrating high results with more than half of them outperforming the provided baseline.
arXiv Detail & Related papers (2020-05-22T13:30:37Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.