A comprehensive empirical analysis on cross-domain semantic enrichment
for detection of depressive language
- URL: http://arxiv.org/abs/2106.12797v1
- Date: Thu, 24 Jun 2021 07:15:09 GMT
- Title: A comprehensive empirical analysis on cross-domain semantic enrichment
for detection of depressive language
- Authors: Nawshad Farruque, Randy Goebel and Osmar Zaiane
- Abstract summary: We start with a rich word embedding pre-trained from a large general dataset, which is then augmented with embeddings learned from a much smaller and more specific domain dataset through a simple non-linear mapping mechanism.
We show that our augmented word embedding representations achieve a significantly better F1 score than the others, specially when applied to a high quality dataset.
- Score: 0.9749560288448115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We analyze the process of creating word embedding feature representations
designed for a learning task when annotated data is scarce, for example, in
depressive language detection from Tweets. We start with a rich word embedding
pre-trained from a large general dataset, which is then augmented with
embeddings learned from a much smaller and more specific domain dataset through
a simple non-linear mapping mechanism. We also experimented with several other
more sophisticated methods of such mapping including, several auto-encoder
based and custom loss-function based methods that learn embedding
representations through gradually learning to be close to the words of similar
semantics and distant to dissimilar semantics. Our strengthened representations
better capture the semantics of the depression domain, as it combines the
semantics learned from the specific domain coupled with word coverage from the
general language. We also present a comparative performance analyses of our
word embedding representations with a simple bag-of-words model, well known
sentiment and psycholinguistic lexicons, and a general pre-trained word
embedding. When used as feature representations for several different machine
learning methods, including deep learning models in a depressive Tweets
identification task, we show that our augmented word embedding representations
achieve a significantly better F1 score than the others, specially when applied
to a high quality dataset. Also, we present several data ablation tests which
confirm the efficacy of our augmentation techniques.
Related papers
- Persian Homograph Disambiguation: Leveraging ParsBERT for Enhanced Sentence Understanding with a Novel Word Disambiguation Dataset [0.0]
We introduce a novel dataset tailored for Persian homograph disambiguation.
Our work encompasses a thorough exploration of various embeddings, evaluated through the cosine similarity method.
We scrutinize the models' performance in terms of Accuracy, Recall, and F1 Score.
arXiv Detail & Related papers (2024-05-24T14:56:36Z) - Spoken Word2Vec: Learning Skipgram Embeddings from Speech [0.8901073744693314]
We show how shallow skipgram-like algorithms fail to encode distributional semantics when the input units are acoustically correlated.
We illustrate the potential of an alternative deep end-to-end variant of the model and examine the effects on the resulting embeddings.
arXiv Detail & Related papers (2023-11-15T19:25:29Z) - Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Understanding Synonymous Referring Expressions via Contrastive Features [105.36814858748285]
We develop an end-to-end trainable framework to learn contrastive features on the image and object instance levels.
We conduct extensive experiments to evaluate the proposed algorithm on several benchmark datasets.
arXiv Detail & Related papers (2021-04-20T17:56:24Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - Multi-sense embeddings through a word sense disambiguation process [2.2344764434954256]
Most Suitable Sense.
(MSSA) disambiguates and annotates each word by its specific sense, considering the semantic effects of its context.
We test our approach on six different benchmarks for the word similarity task, showing that our approach can produce state-of-the-art results.
arXiv Detail & Related papers (2021-01-21T16:22:34Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.