Searching for PETs: Using Distributional and Sentiment-Based Methods to
Find Potentially Euphemistic Terms
- URL: http://arxiv.org/abs/2205.10451v1
- Date: Fri, 20 May 2022 22:21:21 GMT
- Title: Searching for PETs: Using Distributional and Sentiment-Based Methods to
Find Potentially Euphemistic Terms
- Authors: Patrick Lee and Martha Gavidia and Anna Feldman and Jing Peng
- Abstract summary: This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs.
Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of distributional similarities to select and filter phrase candidates from a sentence.
We present the results of our approach tested on a corpus of sentences containing euphemisms, demonstrating its efficacy for detecting single and multi-word PETs.
- Score: 2.666791490663749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a linguistically driven proof of concept for finding
potentially euphemistic terms, or PETs. Acknowledging that PETs tend to be
commonly used expressions for a certain range of sensitive topics, we make use
of distributional similarities to select and filter phrase candidates from a
sentence and rank them using a set of simple sentiment-based metrics. We
present the results of our approach tested on a corpus of sentences containing
euphemisms, demonstrating its efficacy for detecting single and multi-word PETs
from a broad range of topics. We also discuss future potential for
sentiment-based methods on this task.
Related papers
- Dense X Retrieval: What Retrieval Granularity Should We Use? [59.359325855708974]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval. Propositions are defined as atomic expressions within text, each encapsulating a distinct factoid.
Our results reveal that proposition-based retrieval significantly outperforms traditional passage or sentence-based methods in dense retrieval.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - FEED PETs: Further Experimentation and Expansion on the Disambiguation
of Potentially Euphemistic Terms [3.1648534725322666]
We present novel euphemism corpora in three different languages: Yoruba, Spanish, and Mandarin Chinese.
We find that transformers are generally better at classifying vague PETs.
We perform euphemism disambiguation experiments in each language using multilingual transformer models mBERT and XLM-RoBERTa.
arXiv Detail & Related papers (2023-05-31T22:23:20Z) - Word Sense Induction with Knowledge Distillation from BERT [6.88247391730482]
This paper proposes a method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context.
Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings.
arXiv Detail & Related papers (2023-04-20T21:05:35Z) - Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings.
RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z) - A Report on the Euphemisms Detection Shared Task [2.9972063833424216]
This paper presents The Shared Task on Euphemism Detection for the Third Workshop on Figurative Language Processing (Fig 2022) held in conjunction with EMNLP 2022.
Participants were invited to investigate the euphemism detection task: given input text, identify whether it contains a euphemism.
The input data is a corpus of sentences containing potentially euphemistic terms (PETs) collected from the GloWbE corpus.
arXiv Detail & Related papers (2022-11-23T22:06:35Z) - Sentiment-Aware Word and Sentence Level Pre-training for Sentiment
Analysis [64.70116276295609]
SentiWSP is a Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks.
SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks.
arXiv Detail & Related papers (2022-10-18T12:25:29Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic
Terms [2.666791490663749]
We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus.
We find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment.
We observe cases of disagreement in an annotation task, where humans are asked to label PETs as euphemistic or not.
arXiv Detail & Related papers (2022-05-05T16:01:39Z) - POSSCORE: A Simple Yet Effective Evaluation of Conversational Search
with Part of Speech Labelling [25.477834359694473]
Conversational search systems, such as Google Assistant and Microsoft Cortana, provide a new search paradigm where users are allowed, via natural language dialogues, to communicate with search systems.
We propose POSSCORE, a simple yet effective automatic evaluation method for conversational search.
We show that our metrics can correlate with human preference, achieving significant improvements over state-of-the-art baseline metrics.
arXiv Detail & Related papers (2021-09-07T12:31:29Z) - More Than Words: Collocation Tokenization for Latent Dirichlet
Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ.
We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.