LISTN: Lexicon induction with socio-temporal nuance
- URL: http://arxiv.org/abs/2409.19257v1
- Date: Sat, 28 Sep 2024 06:20:20 GMT
- Title: LISTN: Lexicon induction with socio-temporal nuance
- Authors: Christine de Kock,
- Abstract summary: This paper proposes a novel method for inducing in-group lexicons which incorporates its socio-temporal context.
Using dynamic word and user embeddings trained on conversations from online anti-women communities, our approach outperforms prior methods for lexicon induction.
We present novel insights on in-group language which illustrate the utility of this approach.
- Score: 5.384630221560811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research on extremist online communities frequently utilizes linguistic analysis to explore group dynamics and behaviour. Existing studies often rely on outdated lexicons that do not capture the evolving nature of in-group language, nor the social structure of the community. This paper proposes a novel method for inducing in-group lexicons which incorporates its socio-temporal context. Using dynamic word and user embeddings trained on conversations from online anti-women communities, our approach outperforms prior methods for lexicon induction. We provide a new lexicon of manosphere terms, validated by human experts, which quantifies the relevance of each term to a specific sub-community. We present novel insights on in-group language which illustrate the utility of this approach.
Related papers
- Jointly modelling the evolution of community structure and language in online extremist groups [5.384630221560811]
Group interactions take place within a particular socio-temporal context, which should be taken into account when modelling communities.
We propose a method for jointly modelling community structure and language over time, and apply it in the context of extremist anti-women online groups.
arXiv Detail & Related papers (2024-09-28T05:19:51Z) - Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts [49.97673761305336]
We evaluate three large language models (LLMs) for their alignment with human narrative styles and potential gender biases.
Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases.
arXiv Detail & Related papers (2024-06-27T19:26:11Z) - Neural Conversation Models and How to Rein Them in: A Survey of Failures
and Fixes [17.489075240435348]
Recent conditional language models are able to continue any kind of text source in an often seemingly fluent way.
From a linguistic perspective, contributing to a conversation is high.
Recent approaches try to tame the underlying language models at various intervention points.
arXiv Detail & Related papers (2023-08-11T12:07:45Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - TalkUp: Paving the Way for Understanding Empowering Language [38.873632974397744]
This work builds from linguistic and social psychology literature to explore what characterizes empowering language.
We crowdsource a novel dataset of Reddit posts labeled for empowerment.
Preliminary analyses show that this dataset can be used to train language models that capture empowering and disempowering language.
arXiv Detail & Related papers (2023-05-23T17:55:34Z) - Unsupervised Lexical Substitution with Decontextualised Embeddings [48.00929769805882]
We propose a new unsupervised method for lexical substitution using pre-trained language models.
Our method retrieves substitutes based on the similarity of contextualised and decontextualised word embeddings.
We conduct experiments in English and Italian, and show that our method substantially outperforms strong baselines.
arXiv Detail & Related papers (2022-09-17T03:51:47Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - Characterizing English Variation across Social Media Communities with
BERT [9.98785450861229]
We analyze two months of English comments in 474 Reddit communities.
The specificity of different sense clusters to a community, combined with the specificity of a community's unique word types, is used to identify cases where a social group's language deviates from the norm.
We find that communities with highly distinctive language are medium-sized, and their loyal and highly engaged users interact in dense networks.
arXiv Detail & Related papers (2021-02-12T23:50:57Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z) - Analysing Lexical Semantic Change with Contextualised Word
Representations [7.071298726856781]
We propose a novel method that exploits the BERT neural language model to obtain representations of word usages.
We create a new evaluation dataset and show that the model representations and the detected semantic shifts are positively correlated with human judgements.
arXiv Detail & Related papers (2020-04-29T12:18:14Z) - A Benchmark for Systematic Generalization in Grounded Language
Understanding [61.432407738682635]
Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts.
Modern neural networks, by contrast, struggle to interpret novel compositions.
We introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding.
arXiv Detail & Related papers (2020-03-11T08:40:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.