CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic
Terms
- URL: http://arxiv.org/abs/2205.02728v1
- Date: Thu, 5 May 2022 16:01:39 GMT
- Title: CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic
Terms
- Authors: Martha Gavidia, Patrick Lee, Anna Feldman, Jing Peng
- Abstract summary: We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus.
We find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment.
We observe cases of disagreement in an annotation task, where humans are asked to label PETs as euphemistic or not.
- Score: 2.666791490663749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Euphemisms have not received much attention in natural language processing,
despite being an important element of polite and figurative language.
Euphemisms prove to be a difficult topic, not only because they are subject to
language change, but also because humans may not agree on what is a euphemism
and what is not. Nevertheless, the first step to tackling the issue is to
collect and analyze examples of euphemisms. We present a corpus of potentially
euphemistic terms (PETs) along with example texts from the GloWbE corpus.
Additionally, we present a subcorpus of texts where these PETs are not being
used euphemistically, which may be useful for future applications. We also
discuss the results of multiple analyses run on the corpus. Firstly, we find
that sentiment analysis on the euphemistic texts supports that PETs generally
decrease negative and offensive sentiment. Secondly, we observe cases of
disagreement in an annotation task, where humans are asked to label PETs as
euphemistic or not in a subset of our corpus text examples. We attribute the
disagreement to a variety of potential reasons, including if the PET was a
commonly accepted term (CAT).
Related papers
- That was the last straw, we need more: Are Translation Systems Sensitive
to Disambiguating Context? [64.38544995251642]
We study semantic ambiguities that exist in the source (English in this work) itself.
We focus on idioms that are open to both literal and figurative interpretations.
We find that current MT models consistently translate English idioms literally, even when the context suggests a figurative interpretation.
arXiv Detail & Related papers (2023-10-23T06:38:49Z) - FEED PETs: Further Experimentation and Expansion on the Disambiguation
of Potentially Euphemistic Terms [3.1648534725322666]
We present novel euphemism corpora in three different languages: Yoruba, Spanish, and Mandarin Chinese.
We find that transformers are generally better at classifying vague PETs.
We perform euphemism disambiguation experiments in each language using multilingual transformer models mBERT and XLM-RoBERTa.
arXiv Detail & Related papers (2023-05-31T22:23:20Z) - A Report on the Euphemisms Detection Shared Task [2.9972063833424216]
This paper presents The Shared Task on Euphemism Detection for the Third Workshop on Figurative Language Processing (Fig 2022) held in conjunction with EMNLP 2022.
Participants were invited to investigate the euphemism detection task: given input text, identify whether it contains a euphemism.
The input data is a corpus of sentences containing potentially euphemistic terms (PETs) collected from the GloWbE corpus.
arXiv Detail & Related papers (2022-11-23T22:06:35Z) - Are Representations Built from the Ground Up? An Empirical Examination
of Local Composition in Language Models [91.3755431537592]
Representing compositional and non-compositional phrases is critical for language understanding.
We first formulate a problem of predicting the LM-internal representations of longer phrases given those of their constituents.
While we would expect the predictive accuracy to correlate with human judgments of semantic compositionality, we find this is largely not the case.
arXiv Detail & Related papers (2022-10-07T14:21:30Z) - Subject Verb Agreement Error Patterns in Meaningless Sentences: Humans
vs. BERT [64.40111510974957]
We test whether meaning interferes with subject-verb number agreement in English.
We generate semantically well-formed and nonsensical items.
We find that BERT and humans are both sensitive to our semantic manipulation.
arXiv Detail & Related papers (2022-09-21T17:57:23Z) - Searching for PETs: Using Distributional and Sentiment-Based Methods to
Find Potentially Euphemistic Terms [2.666791490663749]
This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs.
Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of distributional similarities to select and filter phrase candidates from a sentence.
We present the results of our approach tested on a corpus of sentences containing euphemisms, demonstrating its efficacy for detecting single and multi-word PETs.
arXiv Detail & Related papers (2022-05-20T22:21:21Z) - Beyond Plain Toxic: Detection of Inappropriate Statements on Flammable
Topics for the Russian Language [76.58220021791955]
We present two text collections labelled according to binary notion of inapropriateness and a multinomial notion of sensitive topic.
To objectivise the notion of inappropriateness, we define it in a data-driven way though crowdsourcing.
arXiv Detail & Related papers (2022-03-04T15:59:06Z) - Do Context-Aware Translation Models Pay the Right Attention? [61.25804242929533]
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so.
In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words?
We introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations.
Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words.
arXiv Detail & Related papers (2021-05-14T17:32:24Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning [29.181547214915238]
We show that an attacker can control the "meaning" of new and existing words by changing their locations in the embedding space.
An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios.
arXiv Detail & Related papers (2020-01-14T17:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.