Lessons Learned from a Citizen Science Project for Natural Language
Processing
- URL: http://arxiv.org/abs/2304.12836v1
- Date: Tue, 25 Apr 2023 14:08:53 GMT
- Title: Lessons Learned from a Citizen Science Project for Natural Language
Processing
- Authors: Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, G\"ozde G\"ul \c{S}ahin,
Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart de Castilho,
Iryna Gurevych
- Abstract summary: Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP.
We conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset.
Our results show that this can yield high-quality annotations and attract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues.
- Score: 53.48988266271858
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many Natural Language Processing (NLP) systems use annotated corpora for
training and evaluation. However, labeled data is often costly to obtain and
scaling annotation projects is difficult, which is why annotation tasks are
often outsourced to paid crowdworkers. Citizen Science is an alternative to
crowdsourcing that is relatively unexplored in the context of NLP. To
investigate whether and how well Citizen Science can be applied in this
setting, we conduct an exploratory study into engaging different groups of
volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing
crowdsourced dataset. Our results show that this can yield high-quality
annotations and attract motivated volunteers, but also requires considering
factors such as scalability, participation over time, and legal and ethical
issues. We summarize lessons learned in the form of guidelines and provide our
code and data to aid future work on Citizen Science.
Related papers
- Building Better: Avoiding Pitfalls in Developing Language Resources when Data is Scarce [27.918975040084387]
Data in a given language should be viewed as more than a collection of tokens.
Good data collection and labeling practices are key to building more human-centered and socially aware technologies.
arXiv Detail & Related papers (2024-10-16T15:51:18Z) - The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - What Can Natural Language Processing Do for Peer Review? [173.8912784451817]
In modern science, peer review is widely used, yet it is hard, time-consuming, and prone to error.
Since the artifacts involved in peer review are largely text-based, Natural Language Processing has great potential to improve reviewing.
We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance.
arXiv Detail & Related papers (2024-05-10T16:06:43Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Fairness Certification for Natural Language Processing and Large
Language Models [0.0]
We follow a qualitative research approach towards a fairness certification for NLP approaches.
We have systematically devised six fairness criteria for NLP, which can be further refined into 18 sub-categories.
arXiv Detail & Related papers (2024-01-02T16:09:36Z) - Situated Natural Language Explanations [54.083715161895036]
Natural language explanations (NLEs) are among the most accessible tools for explaining decisions to humans.
Existing NLE research perspectives do not take the audience into account.
Situated NLE provides a perspective and facilitates further research on the generation and evaluation of explanations.
arXiv Detail & Related papers (2023-08-27T14:14:28Z) - A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs)
For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge.
The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z) - Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for
Underdocumented Languages [6.8708103492634836]
Hundreds of underserved languages have available data sources in the form of interlinear glossed text (IGT) from language documentation efforts.
We make the case that IGT data can be leveraged successfully provided that target language expertise is available.
We illustrate each step through a case study on developing a morphological reinflection system for the Tsimchianic language Gitksan.
arXiv Detail & Related papers (2022-03-17T22:02:25Z) - Natural language processing for achieving sustainable development: the
case of neural labelling to enhance community profiling [2.6734009991058794]
This research paper shows the high potential of NLP applications to enhance the sustainability of projects.
We focus on the case of community profiling in developing countries, where, in contrast to the developed world, a notable data gap exists.
We propose the new task of Automatic UPV classification, which is an extreme multi-class multi-label classification problem.
arXiv Detail & Related papers (2020-04-27T16:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.