Exploring Wikipedia Gender Diversity Over Time $\unicode{x2013}$ The Wikipedia Gender Dashboard (WGD)
- URL: http://arxiv.org/abs/2501.12610v1
- Date: Wed, 22 Jan 2025 03:28:40 GMT
- Title: Exploring Wikipedia Gender Diversity Over Time $\unicode{x2013}$ The Wikipedia Gender Dashboard (WGD)
- Authors: Yahya Yunus, Tianwa Chen, Gianluca Demartini,
- Abstract summary: This paper presents the Wikipedia Gender Dashboard (WGD), a tool designed to enable the interaction with gender distribution data.
The analysis of the data found that female articles only represent around 17% of English Wikipedia, but it has been growing steadily over the last 20 years.
Most subclasses of Person' are male-dominated.
- Score: 4.292453466361998
- License:
- Abstract: The Wikipedia editors' community has been actively pursuing the intent of achieving gender equality. To that end, it is important to explore the historical evolution of underlying gender disparities in Wikipedia articles. This paper presents the Wikipedia Gender Dashboard (WGD), a tool designed to enable the interaction with gender distribution data, including the average age in every subclass of individuals (i.e. Astronauts, Politicians, etc.) over the years. Wikipedia APIs, DBpedia, and Wikidata endpoints were used to query the data to ensure persistent data collection. The WGD was then created with Microsoft Power BI before being embedded on a public website. The analysis of the data available in the WGD found that female articles only represent around 17% of English Wikipedia, but it has been growing steadily over the last 20 years. Meanwhile, the average age across genders decreased over time. WGD also shows that most subclasses of `Person' are male-dominated. Wikipedia editors can make use of WGD to locate areas with marginalized genders in Wikipedia, and increase their efforts to produce more content providing coverage for those genders to achieve better gender equality in Wikipedia.
Related papers
- Survival of the Notable: Gender Asymmetry in Wikipedia Collective Deliberations [0.0]
Articles for Deletion (AfD) discussions on Wikipedia allow editors to gauge the notability of existing articles.
We find that biographies of women are nominated for deletion faster than those of men, despite editors taking longer to reach a consensus for deletion of women.
We find that AfDs about historical figures show a strong tendency to result into the redirecting or merging of the biography under discussion into other encyclopedic entries.
arXiv Detail & Related papers (2024-11-07T00:37:24Z) - Estimating Gender Completeness in Wikipedia [4.292453466361998]
The aim of this paper is to provide the Wikipedia community with instruments to estimate the magnitude of the problem for different entity types.
Our results show not only which gender for different sub-classes of Person is more prevalent in Wikipedia, but also an idea of how complete the coverage is for difference genders and sub-classes of Person.
arXiv Detail & Related papers (2024-01-17T06:03:31Z) - The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - Publishing Wikipedia usage data with strong privacy guarantees [6.410779699541235]
For almost 20 years, the Wikimedia Foundation has been publishing statistics about how many people visited each Wikipedia page on each day.
In June 2023, the Wikimedia Foundation started publishing these statistics with finer granularity, including the country of origin in the daily counts of page views.
This paper describes this data publication: its goals, the process followed from its inception to its deployment, and the outcomes of the data release.
arXiv Detail & Related papers (2023-08-30T19:58:56Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Mapping Process for the Task: Wikidata Statements to Text as Wikipedia
Sentences [68.8204255655161]
We propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level.
The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia.
We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models.
arXiv Detail & Related papers (2022-10-23T08:34:33Z) - WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions
from Paragraphs [66.88232442007062]
We introduce WikiDes, a dataset to generate short descriptions of Wikipedia articles.
The dataset consists of over 80k English samples on 6987 topics.
Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions.
arXiv Detail & Related papers (2022-09-27T01:28:02Z) - Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by
Ranking Algorithms [68.85295025020942]
We propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a Search Engines to support gender stereotypes.
GSR is the first specifically tailored measure for Information Retrieval, capable of quantifying representational harms.
arXiv Detail & Related papers (2020-09-02T20:45:04Z) - Global gender differences in Wikipedia readership [14.112831377937107]
We present novel evidence of gender differences in Wikipedia readership and how they manifest in records of user behavior.
More specifically we report that (1) women are underrepresented among readers of Wikipedia, (2) women view fewer pages per reading session than men do, (3) men and women visit Wikipedia for similar reasons, and (4) men and women exhibit specific topical preferences.
arXiv Detail & Related papers (2020-07-20T18:40:32Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z) - Entity Extraction from Wikipedia List Pages [2.3605348648054463]
We build a large taxonomy from categories and list pages with DBpedia as a backbone.
With distant supervision, we extract training data for the identification of new entities in list pages.
We extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.
arXiv Detail & Related papers (2020-03-11T07:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.