Lessons Learned from a Citizen Science Project for Natural Language
Processing
- URL: http://arxiv.org/abs/2304.12836v1
- Date: Tue, 25 Apr 2023 14:08:53 GMT
- Title: Lessons Learned from a Citizen Science Project for Natural Language
Processing
- Authors: Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, G\"ozde G\"ul \c{S}ahin,
Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart de Castilho,
Iryna Gurevych
- Abstract summary: Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP.
We conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset.
Our results show that this can yield high-quality annotations and attract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues.
- Score: 53.48988266271858
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many Natural Language Processing (NLP) systems use annotated corpora for
training and evaluation. However, labeled data is often costly to obtain and
scaling annotation projects is difficult, which is why annotation tasks are
often outsourced to paid crowdworkers. Citizen Science is an alternative to
crowdsourcing that is relatively unexplored in the context of NLP. To
investigate whether and how well Citizen Science can be applied in this
setting, we conduct an exploratory study into engaging different groups of
volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing
crowdsourced dataset. Our results show that this can yield high-quality
annotations and attract motivated volunteers, but also requires considering
factors such as scalability, participation over time, and legal and ethical
issues. We summarize lessons learned in the form of guidelines and provide our
code and data to aid future work on Citizen Science.
Related papers
- The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - Towards Systematic Monolingual NLP Surveys: GenA of Greek NLP [2.3499129784547663]
This study fills the gap by introducing a method for creating systematic and comprehensive monolingual NLP surveys.
Characterized by a structured search protocol, it can be used to select publications and organize them through a taxonomy of NLP tasks.
By applying our method, we conducted a systematic literature review of Greek NLP from 2012 to 2022.
arXiv Detail & Related papers (2024-07-13T12:01:52Z) - What Can Natural Language Processing Do for Peer Review? [173.8912784451817]
In modern science, peer review is widely used, yet it is hard, time-consuming, and prone to error.
Since the artifacts involved in peer review are largely text-based, Natural Language Processing has great potential to improve reviewing.
We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance.
arXiv Detail & Related papers (2024-05-10T16:06:43Z) - Fairness Certification for Natural Language Processing and Large
Language Models [0.0]
We follow a qualitative research approach towards a fairness certification for NLP approaches.
We have systematically devised six fairness criteria for NLP, which can be further refined into 18 sub-categories.
arXiv Detail & Related papers (2024-01-02T16:09:36Z) - Situated Natural Language Explanations [54.083715161895036]
Natural language explanations (NLEs) are among the most accessible tools for explaining decisions to humans.
Existing NLE research perspectives do not take the audience into account.
Situated NLE provides a perspective and facilitates further research on the generation and evaluation of explanations.
arXiv Detail & Related papers (2023-08-27T14:14:28Z) - Collaborating Heterogeneous Natural Language Processing Tasks via
Federated Learning [55.99444047920231]
The proposed ATC framework achieves significant improvements compared with various baseline methods.
We conduct extensive experiments on six widely-used datasets covering both Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks.
arXiv Detail & Related papers (2022-12-12T09:27:50Z) - A Survey of Knowledge Enhanced Pre-trained Language Models [78.56931125512295]
We present a comprehensive review of Knowledge Enhanced Pre-trained Language Models (KE-PLMs)
For NLU, we divide the types of knowledge into four categories: linguistic knowledge, text knowledge, knowledge graph (KG) and rule knowledge.
The KE-PLMs for NLG are categorized into KG-based and retrieval-based methods.
arXiv Detail & Related papers (2022-11-11T04:29:02Z) - Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for
Underdocumented Languages [6.8708103492634836]
Hundreds of underserved languages have available data sources in the form of interlinear glossed text (IGT) from language documentation efforts.
We make the case that IGT data can be leveraged successfully provided that target language expertise is available.
We illustrate each step through a case study on developing a morphological reinflection system for the Tsimchianic language Gitksan.
arXiv Detail & Related papers (2022-03-17T22:02:25Z) - Natural language processing for achieving sustainable development: the
case of neural labelling to enhance community profiling [2.6734009991058794]
This research paper shows the high potential of NLP applications to enhance the sustainability of projects.
We focus on the case of community profiling in developing countries, where, in contrast to the developed world, a notable data gap exists.
We propose the new task of Automatic UPV classification, which is an extreme multi-class multi-label classification problem.
arXiv Detail & Related papers (2020-04-27T16:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.