We Need to Talk About Data: The Importance of Data Readiness in Natural
Language Processing
- URL: http://arxiv.org/abs/2110.05464v1
- Date: Mon, 11 Oct 2021 17:55:07 GMT
- Title: We Need to Talk About Data: The Importance of Data Readiness in Natural
Language Processing
- Authors: Fredrik Olsson and Magnus Sahlgren
- Abstract summary: We argue that there is a gap between academic research in NLP and its application to problems outside academia.
We propose a method for improving the communication between researchers and external stakeholders regarding the accessibility, validity, and utility of data.
- Score: 3.096615629099618
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we identify the state of data as being an important reason for
failure in applied Natural Language Processing (NLP) projects. We argue that
there is a gap between academic research in NLP and its application to problems
outside academia, and that this gap is rooted in poor mutual understanding
between academic researchers and their non-academic peers who seek to apply
research results to their operations. To foster transfer of research results
from academia to non-academic settings, and the corresponding influx of
requirements back to academia, we propose a method for improving the
communication between researchers and external stakeholders regarding the
accessibility, validity, and utility of data based on Data Readiness Levels
\cite{lawrence2017data}. While still in its infancy, the method has been
iterated on and applied in multiple innovation and research projects carried
out with stakeholders in both the private and public sectors. Finally, we
invite researchers and practitioners to share their experiences, and thus
contributing to a body of work aimed at raising awareness of the importance of
data readiness for NLP.
Related papers
- The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - What Can Natural Language Processing Do for Peer Review? [173.8912784451817]
In modern science, peer review is widely used, yet it is hard, time-consuming, and prone to error.
Since the artifacts involved in peer review are largely text-based, Natural Language Processing has great potential to improve reviewing.
We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance.
arXiv Detail & Related papers (2024-05-10T16:06:43Z) - Research information in the light of artificial intelligence: quality and data ecologies [0.0]
This paper presents multi- and interdisciplinary approaches for finding the appropriate AI technologies for research information.
Professional research information management (RIM) is becoming increasingly important as an expressly data-driven tool for researchers.
arXiv Detail & Related papers (2024-05-06T16:07:56Z) - Context Retrieval via Normalized Contextual Latent Interaction for
Conversational Agent [3.9635467316436133]
We present a novel method, PK-NCLI, that is able to accurately and efficiently identify relevant auxiliary information to improve the quality of conversational responses.
Our experimental results indicate that PK-NCLI outperforms the state-of-the-art method, PK-FoCus, in terms of perplexity, knowledge grounding, and training efficiency.
arXiv Detail & Related papers (2023-12-01T18:53:51Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - A Decade of Knowledge Graphs in Natural Language Processing: A Survey [3.3358633215849927]
Knowledge graphs (KGs) have attracted a surge of interest from both academia and industry.
As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing.
arXiv Detail & Related papers (2022-09-30T21:53:57Z) - Causal Inference in Natural Language Processing: Estimation, Prediction,
Interpretation and Beyond [38.055142444836925]
We consolidate research across academic areas and situate it in the broader Natural Language Processing landscape.
We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding.
In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models.
arXiv Detail & Related papers (2021-09-02T05:40:08Z) - Learnings from Frontier Development Lab and SpaceML -- AI Accelerators
for NASA and ESA [57.06643156253045]
Research with AI and ML technologies lives in a variety of settings with often asynchronous goals and timelines.
We perform a case study of the Frontier Development Lab (FDL), an AI accelerator under a public-private partnership from NASA and ESA.
FDL research follows principled practices that are grounded in responsible development, conduct, and dissemination of AI research.
arXiv Detail & Related papers (2020-11-09T21:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.