Natural language processing for achieving sustainable development: the
case of neural labelling to enhance community profiling
- URL: http://arxiv.org/abs/2004.12935v2
- Date: Tue, 17 Nov 2020 18:28:01 GMT
- Title: Natural language processing for achieving sustainable development: the
case of neural labelling to enhance community profiling
- Authors: Costanza Conforti, Stephanie Hirmer, David Morgan, Marco Basaldella,
Yau Ben Or
- Abstract summary: This research paper shows the high potential of NLP applications to enhance the sustainability of projects.
We focus on the case of community profiling in developing countries, where, in contrast to the developed world, a notable data gap exists.
We propose the new task of Automatic UPV classification, which is an extreme multi-class multi-label classification problem.
- Score: 2.6734009991058794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there has been an increasing interest in the application of
Artificial Intelligence - and especially Machine Learning - to the field of
Sustainable Development (SD). However, until now, NLP has not been applied in
this context. In this research paper, we show the high potential of NLP
applications to enhance the sustainability of projects. In particular, we focus
on the case of community profiling in developing countries, where, in contrast
to the developed world, a notable data gap exists. In this context, NLP could
help to address the cost and time barrier of structuring qualitative data that
prohibits its widespread use and associated benefits. We propose the new task
of Automatic UPV classification, which is an extreme multi-class multi-label
classification problem. We release Stories2Insights, an expert-annotated
dataset, provide a detailed corpus analysis, and implement a number of strong
neural baselines to address the task. Experimental results show that the
problem is challenging, and leave plenty of room for future research at the
intersection of NLP and SD.
Related papers
- Self-Supervised Learning for Text Recognition: A Critical Survey [11.599791967838481]
Text Recognition (TR) refers to the research area that focuses on retrieving textual information from images.
Self-Supervised Learning (SSL) has gained attention by utilizing large datasets of unlabeled data to train Deep Neural Networks (DNN)
This paper seeks to consolidate the use of SSL in the field of TR, offering a critical and comprehensive overview of the current state of the art.
arXiv Detail & Related papers (2024-07-29T11:11:17Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Deep Learning Approaches for Improving Question Answering Systems in
Hepatocellular Carcinoma Research [0.0]
In recent years, advancements in natural language processing (NLP) have been fueled by deep learning techniques.
BERT and GPT-3, trained on vast amounts of data, have revolutionized language understanding and generation.
This paper delves into the current landscape and future prospects of large-scale model-based NLP.
arXiv Detail & Related papers (2024-02-25T09:32:17Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - Lessons Learned from a Citizen Science Project for Natural Language
Processing [53.48988266271858]
Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP.
We conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset.
Our results show that this can yield high-quality annotations and attract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues.
arXiv Detail & Related papers (2023-04-25T14:08:53Z) - Robust Natural Language Processing: Recent Advances, Challenges, and
Future Directions [4.409836695738517]
We present a structured overview of NLP robustness research by summarizing the literature in a systemic way across various dimensions.
We then take a deep-dive into the various dimensions of robustness, across techniques, metrics, embeddings, and benchmarks.
arXiv Detail & Related papers (2022-01-03T17:17:11Z) - Few-shot Named Entity Recognition with Cloze Questions [3.561183926088611]
We propose a simple and intuitive adaptation of Pattern-Exploiting Training (PET), a recent approach which combines the cloze-questions mechanism and fine-tuning for few-shot learning.
Our approach achieves considerably better performance than standard fine-tuning and comparable or improved results with respect to other few-shot baselines.
arXiv Detail & Related papers (2021-11-24T11:08:59Z) - An Empirical Survey of Data Augmentation for Limited Data Learning in
NLP [88.65488361532158]
dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks.
Data augmentation methods have been explored as a means of improving data efficiency in NLP.
We provide an empirical survey of recent progress on data augmentation for NLP in the limited labeled data setting.
arXiv Detail & Related papers (2021-06-14T15:27:22Z) - FedNLP: A Research Platform for Federated Learning in Natural Language
Processing [55.01246123092445]
We present the FedNLP, a research platform for federated learning in NLP.
FedNLP supports various popular task formulations in NLP such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling.
Preliminary experiments with FedNLP reveal that there exists a large performance gap between learning on decentralized and centralized datasets.
arXiv Detail & Related papers (2021-04-18T11:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.