A Study of Slang Representation Methods
- URL: http://arxiv.org/abs/2212.05613v1
- Date: Sun, 11 Dec 2022 21:56:44 GMT
- Title: A Study of Slang Representation Methods
- Authors: Aravinda Kolla, Filip Ilievski, H\^ong-\^An Sandlin and Alain Mermoud
- Abstract summary: We study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding.
Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements.
- Score: 3.511369967593153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Warning: this paper contains content that may be offensive or upsetting.
Considering the large amount of content created online by the minute,
slang-aware automatic tools are critically needed to promote social good, and
assist policymakers and moderators in restricting the spread of offensive
language, abuse, and hate speech. Despite the success of large language models
and the spontaneous emergence of slang dictionaries, it is unclear how far
their combination goes in terms of slang understanding for downstream social
good tasks. In this paper, we provide a framework to study different
combinations of representation learning models and knowledge resources for a
variety of downstream tasks that rely on slang understanding. Our experiments
show the superiority of models that have been pre-trained on social media data,
while the impact of dictionaries is positive only for static word embeddings.
Our error analysis identifies core challenges for slang representation
learning, including out-of-vocabulary words, polysemy, variance, and annotation
disagreements, which can be traced to characteristics of slang as a quickly
evolving and highly subjective language.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - Countering Malicious Content Moderation Evasion in Online Social
Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems.
This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z) - Testing the Ability of Language Models to Interpret Figurative Language [69.59943454934799]
Figurative and metaphorical language are commonplace in discourse.
It remains an open question to what extent modern language models can interpret nonliteral phrases.
We introduce Fig-QA, a Winograd-style nonliteral language understanding task.
arXiv Detail & Related papers (2022-04-26T23:42:22Z) - Analyzing the Limits of Self-Supervision in Handling Bias in Language [52.26068057260399]
We evaluate how well language models capture the semantics of four tasks for bias: diagnosis, identification, extraction and rephrasing.
Our analyses indicate that language models are capable of performing these tasks to widely varying degrees across different bias dimensions, such as gender and political affiliation.
arXiv Detail & Related papers (2021-12-16T05:36:08Z) - Augmenting semantic lexicons using word embeddings and transfer learning [1.101002667958165]
We propose two models for predicting sentiment scores to augment semantic lexicons at a relatively low cost using word embeddings and transfer learning.
Our evaluation shows both models are able to score new words with a similar accuracy to reviewers from Amazon Mechanical Turk, but at a fraction of the cost.
arXiv Detail & Related papers (2021-09-18T20:59:52Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - A Computational Framework for Slang Generation [2.1813490315521773]
We take an initial step toward machine generation of slang by developing a framework that models the speaker's word choice in slang context.
Our framework encodes novel slang meaning by relating the conventional and slang senses of a word.
We perform rigorous evaluations on three slang dictionaries and show that our approach outperforms state-of-the-art language models.
arXiv Detail & Related papers (2021-02-03T01:19:07Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.