Text2Gender: A Deep Learning Architecture for Analysis of Blogger's Age
and Gender
- URL: http://arxiv.org/abs/2305.08633v1
- Date: Mon, 15 May 2023 13:26:50 GMT
- Title: Text2Gender: A Deep Learning Architecture for Analysis of Blogger's Age
and Gender
- Authors: Vishesh Thakur and Aneesh Tickoo
- Abstract summary: We propose a supervised BERT-based classification technique in order to predict the age and gender of bloggers.
The accuracy reported for the prediction of age group was 84.2%, while the accuracy for the prediction of gender was 86.32%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning techniques have gained a lot of traction in the field of NLP
research. The aim of this paper is to predict the age and gender of an
individual by inspecting their written text. We propose a supervised BERT-based
classification technique in order to predict the age and gender of bloggers.
The dataset used contains 681284 rows of data, with the information of the
blogger's age, gender, and text of the blog written by them. We compare our
algorithm to previous works in the same domain and achieve a better accuracy
and F1 score. The accuracy reported for the prediction of age group was 84.2%,
while the accuracy for the prediction of gender was 86.32%. This study relies
on the raw capabilities of BERT to predict the classes of textual data
efficiently. This paper shows promising capability in predicting the
demographics of the author with high accuracy and can have wide applicability
across multiple domains.
Related papers
- Who Are You Behind the Screen? Implicit MBTI and Gender Detection Using Artificial Intelligence [0.0]
This work investigates implicit categorization, inferring personality and gender variables directly from linguistic patterns in Telegram conversation data.
We refine a Transformer-based language model (RoBERTa) to capture complex linguistic cues indicative of personality traits and gender differences.
Confidence levels help to greatly raise model accuracy to 86.16%, hence proving RoBERTa's capacity to consistently identify implicit personality types from conversational text data.
arXiv Detail & Related papers (2025-03-12T21:24:22Z) - BN-AuthProf: Benchmarking Machine Learning for Bangla Author Profiling on Social Media Texts [0.0]
This paper aims to extract valuable insights about anonymous authors based on their writing style on social media.
The dataset comprises 30,131 social media posts from 300 authors, labeled by their age and gender.
Various classical machine learning and deep learning techniques were employed to evaluate the dataset.
arXiv Detail & Related papers (2024-12-03T00:32:32Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - Text2Time: Transformer-based Article Time Period Prediction [0.11470070927586018]
This work investigates the problem of predicting the publication period of a text document, specifically a news article, based on its textual content.
We create our own extensive labeled dataset of over 350,000 news articles published by The New York Times over six decades.
In our approach, we use a pretrained BERT model fine-tuned for the task of text classification, specifically for time period prediction.
arXiv Detail & Related papers (2023-04-21T10:05:03Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Predicting article quality scores with machine learning: The UK Research
Excellence Framework [6.582887504429817]
Accuracy is highest in the medical and physical sciences Units of Assessment (UoAs) and economics.
Prediction accuracies above the baseline for the social science, mathematics, engineering, arts, humanities, and UoAs were much lower or close to zero.
We increased accuracy with an active learning strategy and by selecting articles with higher prediction probabilities, as estimated by the algorithms, but this substantially reduced the number of scores predicted.
arXiv Detail & Related papers (2022-12-11T05:45:12Z) - Gender Bias in Big Data Analysis [0.0]
It measures gender bias when gender prediction software tools are used in historical big data research.
Gender bias is measured by contrasting personally identified computer science authors in the well-regarded DBLP dataset.
arXiv Detail & Related papers (2022-11-17T20:13:04Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - LeQua@CLEF2022: Learning to Quantify [76.22817970624875]
LeQua 2022 is a new lab for the evaluation of methods for learning to quantify'' in textual datasets.
The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting.
arXiv Detail & Related papers (2021-11-22T14:54:20Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Gender prediction using limited Twitter Data [0.0]
This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media.
A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person.
Results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users.
arXiv Detail & Related papers (2020-09-29T11:46:07Z) - Investigating Gender Bias in BERT [22.066477991442003]
We analyse the gender-bias it induces in five downstream tasks related to emotion and sentiment intensity prediction.
We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer.
Experiments show that removing embedding components in such directions achieves great success in reducing BERT-induced bias in the downstream tasks.
arXiv Detail & Related papers (2020-09-10T17:38:32Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Investigating Bias in Deep Face Analysis: The KANFace Dataset and
Empirical Study [67.3961439193994]
We introduce the most comprehensive, large-scale dataset of facial images and videos to date.
The data are manually annotated in terms of identity, exact age, gender and kinship.
A method to debias network embeddings is introduced and tested on the proposed benchmarks.
arXiv Detail & Related papers (2020-05-15T00:14:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.