Related papers: Gender Recognition in Informal and Formal Language Scenarios via Transfer Learning

Gender Recognition in Informal and Formal Language Scenarios via Transfer Learning

URL: http://arxiv.org/abs/2107.02759v1
Date: Wed, 23 Jun 2021 15:32:50 GMT
Title: Gender Recognition in Informal and Formal Language Scenarios via Transfer Learning
Authors: Daniel Escobar-Grisales, Juan Camilo Vasquez-Correa, Juan Rafael Orozco-Arroyave
Abstract summary: Recognition and identification of demographic traits such as gender, age, location, or personality based on text data can help to improve different marketing strategies. This paper proposes the use of recurrent and convolutional neural networks, and a transfer learning strategy for gender recognition in documents written in informal and formal languages.
Score: 11.048994919361034
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The interest in demographic information retrieval based on text data has increased in the research community because applications have shown success in different sectors such as security, marketing, heath-care, and others. Recognition and identification of demographic traits such as gender, age, location, or personality based on text data can help to improve different marketing strategies. For instance it makes it possible to segment and to personalize offers, thus products and services are exposed to the group of greatest interest. This type of technology has been discussed widely in documents from social media. However, the methods have been poorly studied in data with a more formal structure, where there is no access to emoticons, mentions, and other linguistic phenomena that are only present in social media. This paper proposes the use of recurrent and convolutional neural networks, and a transfer learning strategy for gender recognition in documents that are written in informal and formal languages. Models are tested in two different databases consisting of Tweets and call-center conversations. Accuracies of up to 75\% are achieved for both databases. The results also indicate that it is possible to transfer the knowledge from a system trained on a specific type of expressions or idioms such as those typically used in social media into a more formal type of text data, where the amount of data is more scarce and its structure is completely different.

Related papers

Towards High-Fidelity Synthetic Multi-platform Social Media Datasets via Large Language Models [0.0]
Social media datasets are essential for research on a variety of topics, such as disinformation, influence operations, hate speech detection, or influencer marketing practices.<n>Access to social media datasets is often constrained due to costs and platform restrictions.<n>This paper explores the potential of large language models to create lexically and semantically relevant social media datasets across multiple platforms.
arXiv Detail & Related papers (2025-05-02T18:56:01Z)
Detecting Linguistic Diversity on Social Media [1.3108652488669732]
We use published census data as the ground truth and the social media sub-corpus from the Corpus of Global Language Use as our alternative data source. We identify the language conditions of each tweet in the social media data set and validated our results with two language identification models. The results suggest that social media language data has the possibility to provide a rich source of spatial and temporal insights on the linguistic profile of a place.
arXiv Detail & Related papers (2025-02-28T16:56:34Z)
The Echoes of the 'I': Tracing Identity with Demographically Enhanced Word Embeddings [0.0]
Identity is one of the most commonly studied constructs in social science. This paper introduces a novel approach to studying identity by enhancing word embeddings with socio-demographic information.
arXiv Detail & Related papers (2024-06-29T06:59:35Z)
MultiSocial: Multilingual Benchmark of Machine-Generated Text Detection of Social-Media Texts [0.6053347262128919]
MultiSocial dataset contains 472,097 texts, of which about 58k are human-written. We use this benchmark to compare existing detection methods in zero-shot as well as fine-tuned form. Our results indicate that the fine-tuned detectors have no problem to be trained on social-media texts.
arXiv Detail & Related papers (2024-06-18T12:26:09Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification. Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z)
Exploring Fake News Detection with Heterogeneous Social Media Context Graphs [4.2177790395417745]
Fake news detection has become a research area that goes way beyond a purely academic interest as it has direct implications on our society as a whole. We propose to construct heterogeneous social context graphs around news articles and reformulate the problem as a graph classification task.
arXiv Detail & Related papers (2022-12-13T13:29:47Z)
Language Independent Stance Detection: Social Interaction-based Embeddings and Large Language Models [4.899818550820576]
This paper aims to take on the stance detection task by placing the emphasis not so much on the text itself but on the interaction available on social networks. We propose a new method to leverage social information such as friends retweets by generating Embeddings. Our experiments on seven publicly available datasets and four different languages show that combining our relational embeddings with discriminative textual methods helps to substantially improve performance.
arXiv Detail & Related papers (2022-10-11T18:13:43Z)
Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods. This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z)
Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods. We conducted three types of experiments -- monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z)
A Multi-input Multi-output Transformer-based Hybrid Neural Network for Multi-class Privacy Disclosure Detection [3.04585143845864]
In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns. We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets.
arXiv Detail & Related papers (2021-08-19T03:58:49Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English. We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z)
TopicBERT: A Transformer transfer learning based memory-graph approach for multimodal streaming social media topic detection [8.338441212378587]
Social networks with bursty short messages and their respective large data scale spread among vast variety of topics are research interest of many researchers. These properties of social networks which are known as 5'Vs of big data has led to many unique and enlightenment algorithms and techniques applied to large social networking datasets and data streams.
arXiv Detail & Related papers (2020-08-16T10:39:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.