Related papers: Searching for PETs: Using Distributional and Sentiment-Based Methods to Find Potentially Euphemistic Terms

Searching for PETs: Using Distributional and Sentiment-Based Methods to Find Potentially Euphemistic Terms

URL: http://arxiv.org/abs/2205.10451v1
Date: Fri, 20 May 2022 22:21:21 GMT
Title: Searching for PETs: Using Distributional and Sentiment-Based Methods to Find Potentially Euphemistic Terms
Authors: Patrick Lee and Martha Gavidia and Anna Feldman and Jing Peng
Abstract summary: This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs. Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of distributional similarities to select and filter phrase candidates from a sentence. We present the results of our approach tested on a corpus of sentences containing euphemisms, demonstrating its efficacy for detecting single and multi-word PETs.
Score: 2.666791490663749
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a linguistically driven proof of concept for finding potentially euphemistic terms, or PETs. Acknowledging that PETs tend to be commonly used expressions for a certain range of sensitive topics, we make use of distributional similarities to select and filter phrase candidates from a sentence and rank them using a set of simple sentiment-based metrics. We present the results of our approach tested on a corpus of sentences containing euphemisms, demonstrating its efficacy for detecting single and multi-word PETs from a broad range of topics. We also discuss future potential for sentiment-based methods on this task.

Related papers

Can LLMs assist with Ambiguity? A Quantitative Evaluation of various Large Language Models on Word Sense Disambiguation [5.816964541847194]
This study investigates the use of Large Language Models (LLMs) to improve Word Sense Disambiguation (WSD) The proposed method incorporates a human-in-loop approach for prompt augmentation where prompt is supported by Part-of-Speech (POS) tagging, synonyms of ambiguous words, aspect-based sense filtering and few-shot prompting. By utilizing a few-shot Chain of Thought (COT) prompting-based approach, this work demonstrates a substantial improvement in performance.
arXiv Detail & Related papers (2024-11-27T13:35:32Z)
A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models. We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms [3.1648534725322666]
We present novel euphemism corpora in three different languages: Yoruba, Spanish, and Mandarin Chinese. We find that transformers are generally better at classifying vague PETs. We perform euphemism disambiguation experiments in each language using multilingual transformer models mBERT and XLM-RoBERTa.
arXiv Detail & Related papers (2023-05-31T22:23:20Z)
Word Sense Induction with Knowledge Distillation from BERT [6.88247391730482]
This paper proposes a method to distill multiple word senses from a pre-trained language model (BERT) by using attention over the senses of a word in a context. Experiments on the contextual word similarity and sense induction tasks show that this method is superior to or competitive with state-of-the-art multi-sense embeddings.
arXiv Detail & Related papers (2023-04-20T21:05:35Z)
Relational Sentence Embedding for Flexible Semantic Matching [86.21393054423355]
We present Sentence Embedding (RSE), a new paradigm to discover further the potential of sentence embeddings. RSE is effective and flexible in modeling sentence relations and outperforms a series of state-of-the-art embedding methods.
arXiv Detail & Related papers (2022-12-17T05:25:17Z)
A Report on the Euphemisms Detection Shared Task [2.9972063833424216]
This paper presents The Shared Task on Euphemism Detection for the Third Workshop on Figurative Language Processing (Fig 2022) held in conjunction with EMNLP 2022. Participants were invited to investigate the euphemism detection task: given input text, identify whether it contains a euphemism. The input data is a corpus of sentences containing potentially euphemistic terms (PETs) collected from the GloWbE corpus.
arXiv Detail & Related papers (2022-11-23T22:06:35Z)
Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis [64.70116276295609]
SentiWSP is a Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks. SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks.
arXiv Detail & Related papers (2022-10-18T12:25:29Z)
DICTDIS: Dictionary Constrained Disambiguation for Improved NMT [50.888881348723295]
We present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries. We demonstrate the utility of DictDis via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering.
arXiv Detail & Related papers (2022-10-13T13:04:16Z)
Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED) Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title. In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it. We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z)
CATs are Fuzzy PETs: A Corpus and Analysis of Potentially Euphemistic Terms [2.666791490663749]
We present a corpus of potentially euphemistic terms (PETs) along with example texts from the GloWbE corpus. We find that sentiment analysis on the euphemistic texts supports that PETs generally decrease negative and offensive sentiment. We observe cases of disagreement in an annotation task, where humans are asked to label PETs as euphemistic or not.
arXiv Detail & Related papers (2022-05-05T16:01:39Z)
POSSCORE: A Simple Yet Effective Evaluation of Conversational Search with Part of Speech Labelling [25.477834359694473]
Conversational search systems, such as Google Assistant and Microsoft Cortana, provide a new search paradigm where users are allowed, via natural language dialogues, to communicate with search systems. We propose POSSCORE, a simple yet effective automatic evaluation method for conversational search. We show that our metrics can correlate with human preference, achieving significant improvements over state-of-the-art baseline metrics.
arXiv Detail & Related papers (2021-09-07T12:31:29Z)
More Than Words: Collocation Tokenization for Latent Dirichlet Allocation Models [71.42030830910227]
We propose a new metric for measuring the clustering quality in settings where the models differ. We show that topics trained with merged tokens result in topic keys that are clearer, more coherent, and more effective at distinguishing topics than those unmerged models.
arXiv Detail & Related papers (2021-08-24T14:08:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.