The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research
- URL: http://arxiv.org/abs/2305.02797v4
- Date: Tue, 16 Jul 2024 08:53:19 GMT
- Title: The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research
- Authors: Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aurélie Névéol, Fanny Ducel, Saif M. Mohammad, Karën Fort,
- Abstract summary: We use a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors.
We find that industry presence among NLP authors has been steady before a steep increase over the past five years.
A few companies account for most of the publications and provide funding to academic researchers through grants and internships.
- Score: 28.382353702576314
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Recent advances in deep learning methods for natural language processing (NLP) have created new business opportunities and made NLP research critical for industry development. As one of the big players in the field of NLP, together with governments and universities, it is important to track the influence of industry on research. In this study, we seek to quantify and characterize industry presence in the NLP community over time. Using a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors, we explore the industry presence in the field since the early 90s. We find that industry presence among NLP authors has been steady before a steep increase over the past five years (180% growth from 2017 to 2022). A few companies account for most of the publications and provide funding to academic researchers through grants and internships. Our study shows that the presence and impact of the industry on natural language processing research are significant and fast-growing. This work calls for increased transparency of industry influence in the field.
Related papers
- The Nature of NLP: Analyzing Contributions in NLP Papers [77.31665252336157]
We quantitatively investigate what constitutes NLP research by examining research papers.
Our findings reveal a rising involvement of machine learning in NLP since the early nineties.
In post-2020, there has been a resurgence of focus on language and people.
arXiv Detail & Related papers (2024-09-29T01:29:28Z) - What Can Natural Language Processing Do for Peer Review? [173.8912784451817]
In modern science, peer review is widely used, yet it is hard, time-consuming, and prone to error.
Since the artifacts involved in peer review are largely text-based, Natural Language Processing has great potential to improve reviewing.
We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance.
arXiv Detail & Related papers (2024-05-10T16:06:43Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions [2.6746207141044582]
We surveyed 100 papers published at EMNLP 2022 to determine the degree to which researchers rely on industry models.
Our work serves as a scaffold to enable future researchers to more accurately address whether collaboration with industry is still collaboration in the absence of an alternative.
arXiv Detail & Related papers (2023-12-06T21:12:22Z) - Defining a New NLP Playground [85.41973504055588]
The recent explosion of performance of large language models has changed the field of Natural Language Processing more abruptly and seismically than any other shift in the field's 80-year history.
This paper proposes 20+ PhD-dissertation-worthy research directions, covering theoretical analysis, new and challenging problems, learning paradigms, and interdisciplinary applications.
arXiv Detail & Related papers (2023-10-31T17:02:33Z) - We are Who We Cite: Bridges of Influence Between Natural Language Processing and Other Academic Fields [30.550895983110806]
Cross-field engagement of Natural Language Processing has declined.
Less than 8% of NLP citations are to linguistics.
Less than 3% of NLP citations are to math and psychology.
arXiv Detail & Related papers (2023-10-23T12:42:06Z) - Natural Language Processing in the Legal Domain [3.0223880754806505]
We construct and analyze a nearly complete corpus of more than six hundred NLP & Law related papers published over the past decade.
We observe an increase in the sophistication of the methods which researchers deployed in this applied context.
We believe all of these trends bode well for the future of the field, but many questions in both the academic and commercial sphere still remain open.
arXiv Detail & Related papers (2023-02-23T14:02:47Z) - A Major Obstacle for NLP Research: Let's Talk about Time Allocation! [25.820755718678786]
This paper argues that we have been less successful than we should have been in the field of natural language processing.
We demonstrate that, in recent years, subpar time allocation has been a major obstacle for NLP research.
arXiv Detail & Related papers (2022-11-30T10:00:12Z) - Systematic Inequalities in Language Technology Performance across the
World's Languages [94.65681336393425]
We introduce a framework for estimating the global utility of language technologies.
Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies and more linguistic NLP tasks.
arXiv Detail & Related papers (2021-10-13T14:03:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.