LISTN: Lexicon induction with socio-temporal nuance
- URL: http://arxiv.org/abs/2409.19257v2
- Date: Wed, 11 Dec 2024 21:21:11 GMT
- Title: LISTN: Lexicon induction with socio-temporal nuance
- Authors: Christine de Kock,
- Abstract summary: In-group language is an important signifier of group dynamics.
This paper proposes a novel method for inducing lexicons of in-group language.
Using dynamic word and user embeddings trained on conversations from online anti-women communities, our approach outperforms prior methods for lexicon induction.
- Score: 5.384630221560811
- License:
- Abstract: In-group language is an important signifier of group dynamics. This paper proposes a novel method for inducing lexicons of in-group language, which incorporates its socio-temporal context. Existing methods for lexicon induction do not capture the evolving nature of in-group language, nor the social structure of the community. Using dynamic word and user embeddings trained on conversations from online anti-women communities, our approach outperforms prior methods for lexicon induction. We develop a test set for the task of lexicon induction and a new lexicon of manosphere language, validated by human experts, which quantifies the relevance of each term to a specific sub-community at a given point in time. Finally, we present novel insights on in-group language which illustrate the utility of this approach.
Related papers
- Jointly modelling the evolution of community structure and language in online extremist groups [5.384630221560811]
Group interactions take place within a particular socio-temporal context, which should be taken into account when modelling communities.
We propose a method for jointly modelling community structure and language over time, and apply it in the context of extremist anti-women online groups.
arXiv Detail & Related papers (2024-09-28T05:19:51Z) - How Lexical is Bilingual Lexicon Induction? [1.3610643403050855]
We argue that the incorporation of additional lexical information into the recent retrieve-and-rank approach should improve lexicon induction.
We demonstrate the efficacy of our proposed approach on XLING, improving over the previous state of the art by an average of 2% across all language pairs.
arXiv Detail & Related papers (2024-04-05T17:10:33Z) - BabySLM: language-acquisition-friendly benchmark of self-supervised
spoken language models [56.93604813379634]
Self-supervised techniques for learning speech representations have been shown to develop linguistic competence from exposure to speech without the need for human labels.
We propose a language-acquisition-friendly benchmark to probe spoken language models at the lexical and syntactic levels.
We highlight two exciting challenges that need to be addressed for further progress: bridging the gap between text and speech and between clean speech and in-the-wild speech.
arXiv Detail & Related papers (2023-06-02T12:54:38Z) - TalkUp: Paving the Way for Understanding Empowering Language [38.873632974397744]
This work builds from linguistic and social psychology literature to explore what characterizes empowering language.
We crowdsource a novel dataset of Reddit posts labeled for empowerment.
Preliminary analyses show that this dataset can be used to train language models that capture empowering and disempowering language.
arXiv Detail & Related papers (2023-05-23T17:55:34Z) - A Comprehensive Empirical Evaluation of Existing Word Embedding
Approaches [5.065947993017158]
We present the characteristics of existing word embedding approaches and analyze them with regard to many classification tasks.
Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well.
On the other hand, Neural-network-based approaches can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations.
arXiv Detail & Related papers (2023-03-13T15:34:19Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Automatically Identifying Language Family from Acoustic Examples in Low
Resource Scenarios [48.57072884674938]
We propose a method to analyze language similarity using deep learning.
Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language family findings.
arXiv Detail & Related papers (2020-12-01T22:44:42Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - A Benchmark for Systematic Generalization in Grounded Language
Understanding [61.432407738682635]
Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts.
Modern neural networks, by contrast, struggle to interpret novel compositions.
We introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding.
arXiv Detail & Related papers (2020-03-11T08:40:15Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.