How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages
- URL: http://arxiv.org/abs/2509.15518v1
- Date: Fri, 19 Sep 2025 01:49:27 GMT
- Title: How do Language Models Generate Slang: A Systematic Comparison between Human and Machine-Generated Slang Usages
- Authors: Siyang Wu, Zhewei Sun,
- Abstract summary: Slang is a commonly used type of informal language that poses a daunting challenge to NLP systems.<n>Recent advances in large language models (LLMs) have made the problem more approachable.<n>We compare human-attested slang usages from the Online Slang Dictionary (OSD) and slang generated by GPT-4o and Llama-3.
- Score: 2.887631096209473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Slang is a commonly used type of informal language that poses a daunting challenge to NLP systems. Recent advances in large language models (LLMs), however, have made the problem more approachable. While LLM agents are becoming more widely applied to intermediary tasks such as slang detection and slang interpretation, their generalizability and reliability are heavily dependent on whether these models have captured structural knowledge about slang that align well with human attested slang usages. To answer this question, we contribute a systematic comparison between human and machine-generated slang usages. Our evaluative framework focuses on three core aspects: 1) Characteristics of the usages that reflect systematic biases in how machines perceive slang, 2) Creativity reflected by both lexical coinages and word reuses employed by the slang usages, and 3) Informativeness of the slang usages when used as gold-standard examples for model distillation. By comparing human-attested slang usages from the Online Slang Dictionary (OSD) and slang generated by GPT-4o and Llama-3, we find significant biases in how LLMs perceive slang. Our results suggest that while LLMs have captured significant knowledge about the creative aspects of slang, such knowledge does not align with humans sufficiently to enable LLMs for extrapolative tasks such as linguistic analyses.
Related papers
- SLAyiNG: Towards Queer Language Processing [44.4984082814346]
SLAyiNG is the first dataset containing annotated queer slang derived from subtitles, social media posts, and podcasts.<n>We describe our data curation process, including the collection of slang terms and definitions, scraping sources for examples that reflect usage of these terms.<n>As preliminary results, we calculate inter-annotator agreement for human annotators and OpenAI's model o3-mini.
arXiv Detail & Related papers (2025-09-22T07:41:45Z) - PhoniTale: Phonologically Grounded Mnemonic Generation for Typologically Distant Language Pairs [51.745816131869674]
Large language models (LLMs) have been used to generate keyword mnemonics by leveraging similar keywords from a learner's first language (L1) to aid in acquiring L2 vocabulary.<n>We present PhoniTale, a novel cross-lingual mnemonic generation system that performs IPA-based phonological adaptation and syllable-aware alignment to retrieve L1 keyword sequence.<n>Our findings show that PhoniTale consistently outperforms previous automated approaches and achieves quality comparable to human-written mnemonics.
arXiv Detail & Related papers (2025-07-07T19:50:12Z) - SlangDIT: Benchmarking LLMs in Interpretative Slang Translation [89.48208612476068]
This paper introduces the interpretative slang translation task (named SlangDIT)<n>It consists of three sub-tasks: slang detection, cross-lingual slang explanation, and slang translation within the current context.<n>Based on the benchmark, we propose a deep thinking model, named SlangOWL. It firstly identifies whether the sentence contains a slang, and then judges whether the slang is polysemous and analyze its possible meaning.
arXiv Detail & Related papers (2025-05-20T10:37:34Z) - Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We introduce ReDial, a benchmark containing 1.2K+ parallel query pairs in Standardized English and AAVE.<n>We evaluate widely used models, including GPT, Claude, Llama, Mistral, and the Phi model families.<n>Our work establishes a systematic and objective framework for analyzing LLM bias in dialectal queries.
arXiv Detail & Related papers (2024-10-14T18:44:23Z) - Toward Informal Language Processing: Knowledge of Slang in Large Language Models [16.42982896928428]
We construct a dataset that supports evaluation on a diverse set of tasks pertaining to automatic processing of slang.
For both evaluation and finetuning, we show the effectiveness of our dataset on two core applications.
We find that while LLMs such as GPT-4 achieve good performance in a zero-shot setting, smaller BERT-like models finetuned on our dataset achieve comparable performance.
arXiv Detail & Related papers (2024-04-02T21:50:18Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - A Study of Slang Representation Methods [3.511369967593153]
We study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding.
Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements.
arXiv Detail & Related papers (2022-12-11T21:56:44Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Machine Reading, Fast and Slow: When Do Models "Understand" Language? [59.897515617661874]
We investigate the behavior of reading comprehension models with respect to two linguistic'skills': coreference resolution and comparison.
We find that for comparison (but not coreference) the systems based on larger encoders are more likely to rely on the 'right' information.
arXiv Detail & Related papers (2022-09-15T16:25:44Z) - Semantically Informed Slang Interpretation [2.9097456604613745]
We propose a semantically informed slang interpretation (SSI) framework that considers jointly the contextual and semantic appropriateness of a candidate interpretation for a query slang.
We show how the same framework can be applied to enhancing machine translation of slang from English to other languages.
arXiv Detail & Related papers (2022-05-02T01:51:49Z) - A Computational Framework for Slang Generation [2.1813490315521773]
We take an initial step toward machine generation of slang by developing a framework that models the speaker's word choice in slang context.
Our framework encodes novel slang meaning by relating the conventional and slang senses of a word.
We perform rigorous evaluations on three slang dictionaries and show that our approach outperforms state-of-the-art language models.
arXiv Detail & Related papers (2021-02-03T01:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.