FEED PETs: Further Experimentation and Expansion on the Disambiguation
of Potentially Euphemistic Terms
- URL: http://arxiv.org/abs/2306.00217v2
- Date: Tue, 6 Jun 2023 19:17:14 GMT
- Title: FEED PETs: Further Experimentation and Expansion on the Disambiguation
of Potentially Euphemistic Terms
- Authors: Patrick Lee, Iyanuoluwa Shode, Alain Chirino Trujillo, Yuan Zhao,
Olumide Ebenezer Ojo, Diana Cuevas Plancarte, Anna Feldman, Jing Peng
- Abstract summary: We present novel euphemism corpora in three different languages: Yoruba, Spanish, and Mandarin Chinese.
We find that transformers are generally better at classifying vague PETs.
We perform euphemism disambiguation experiments in each language using multilingual transformer models mBERT and XLM-RoBERTa.
- Score: 3.1648534725322666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers have been shown to work well for the task of English euphemism
disambiguation, in which a potentially euphemistic term (PET) is classified as
euphemistic or non-euphemistic in a particular context. In this study, we
expand on the task in two ways. First, we annotate PETs for vagueness, a
linguistic property associated with euphemisms, and find that transformers are
generally better at classifying vague PETs, suggesting linguistic differences
in the data that impact performance. Second, we present novel euphemism corpora
in three different languages: Yoruba, Spanish, and Mandarin Chinese. We perform
euphemism disambiguation experiments in each language using multilingual
transformer models mBERT and XLM-RoBERTa, establishing preliminary results from
which to launch future work.
Related papers
- Turkish Delights: a Dataset on Turkish Euphemisms [1.7614751781649955]
This research extends the current computational work on potentially euphemistic terms (PETs) to Turkish.
We introduce the Turkish PET dataset, the first available of its kind in the field.
We provide both euphemistic and non-euphemistic examples of PETs in Turkish.
arXiv Detail & Related papers (2024-07-17T22:13:42Z) - MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially
Euphemistic Terms [10.154915854525928]
We train a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings.
We show that multilingual models perform better on the task compared to monolingual models by a statistically significant margin.
In a follow-up analysis, we focus on universal euphemistic "categories" such as death and bodily functions among others.
arXiv Detail & Related papers (2024-01-25T21:38:30Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - A Report on the Euphemisms Detection Shared Task [2.9972063833424216]
This paper presents The Shared Task on Euphemism Detection for the Third Workshop on Figurative Language Processing (Fig 2022) held in conjunction with EMNLP 2022.
Participants were invited to investigate the euphemism detection task: given input text, identify whether it contains a euphemism.
The input data is a corpus of sentences containing potentially euphemistic terms (PETs) collected from the GloWbE corpus.
arXiv Detail & Related papers (2022-11-23T22:06:35Z) - Shapley Head Pruning: Identifying and Removing Interference in
Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters.
We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z) - Searching for PETs: Using Distributional and Sentiment-Based Methods to
Find Potentially Euphemistic Terms [2.666791490663749]
This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs.
Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of distributional similarities to select and filter phrase candidates from a sentence.
We present the results of our approach tested on a corpus of sentences containing euphemisms, demonstrating its efficacy for detecting single and multi-word PETs.
arXiv Detail & Related papers (2022-05-20T22:21:21Z) - CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic
Terms [2.666791490663749]
We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus.
We find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment.
We observe cases of disagreement in an annotation task, where humans are asked to label PETs as euphemistic or not.
arXiv Detail & Related papers (2022-05-05T16:01:39Z) - Verb Knowledge Injection for Multilingual Event Processing [50.27826310460763]
We investigate whether injecting explicit information on verbs' semantic-syntactic behaviour improves the performance of LM-pretrained Transformers.
We first demonstrate that injecting verb knowledge leads to performance gains in English event extraction.
We then explore the utility of verb adapters for event extraction in other languages.
arXiv Detail & Related papers (2020-12-31T03:24:34Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Improving Sentiment Analysis over non-English Tweets using Multilingual
Transformers and Automatic Translation for Data-Augmentation [77.69102711230248]
We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages.
Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.
arXiv Detail & Related papers (2020-10-07T15:44:55Z) - Is Supervised Syntactic Parsing Beneficial for Language Understanding?
An Empirical Investigation [71.70562795158625]
Traditional NLP has long held (supervised) syntactic parsing necessary for successful higher-level semantic language understanding (LU)
Recent advent of end-to-end neural models, self-supervised via language modeling (LM), and their success on a wide range of LU tasks, questions this belief.
We empirically investigate the usefulness of supervised parsing for semantic LU in the context of LM-pretrained transformer networks.
arXiv Detail & Related papers (2020-08-15T21:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.