Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers
- URL: http://arxiv.org/abs/2307.10700v4
- Date: Sun, 28 Apr 2024 23:13:16 GMT
- Title: Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers
- Authors: Rajiv Movva, Sidhika Balachandar, Kenny Peng, Gabriel Agostini, Nikhil Garg, Emma Pierson,
- Abstract summary: Large language models (LLMs) are dramatically influencing AI research, spurring discussions on what has changed so far and how to shape the field's future.
To clarify such questions, we analyze a new dataset of 16,979 LLM-related arXiv papers, focusing on recent trends in 2023 vs. 2018-2022.
An influx of new authors -- half of all first authors in 2023 -- are entering from non-NLP fields of AI, driving disciplinary expansion.
Surprisingly, industry accounts for a smaller publication share in 2023, largely due to reduced output from Google and other Big Tech companies.
- Score: 1.5362868418787874
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models (LLMs) are dramatically influencing AI research, spurring discussions on what has changed so far and how to shape the field's future. To clarify such questions, we analyze a new dataset of 16,979 LLM-related arXiv papers, focusing on recent trends in 2023 vs. 2018-2022. First, we study disciplinary shifts: LLM research increasingly considers societal impacts, evidenced by 20x growth in LLM submissions to the Computers and Society sub-arXiv. An influx of new authors -- half of all first authors in 2023 -- are entering from non-NLP fields of CS, driving disciplinary expansion. Second, we study industry and academic publishing trends. Surprisingly, industry accounts for a smaller publication share in 2023, largely due to reduced output from Google and other Big Tech companies; universities in Asia are publishing more. Third, we study institutional collaboration: while industry-academic collaborations are common, they tend to focus on the same topics that industry focuses on rather than bridging differences. The most prolific institutions are all US- or China-based, but there is very little cross-country collaboration. We discuss implications around (1) how to support the influx of new authors, (2) how industry trends may affect academics, and (3) possible effects of (the lack of) collaboration.
Related papers
- Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [51.453135368388686]
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM)
Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level.
arXiv Detail & Related papers (2024-03-11T21:51:39Z) - Position: AI/ML Influencers Have a Place in the Academic Process [82.2069685579588]
We investigate the role of social media influencers in enhancing the visibility of machine learning research.
We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023.
Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers.
arXiv Detail & Related papers (2024-01-24T20:05:49Z) - The complementary contributions of academia and industry to AI research [0.0]
We characterize the impact and type of AI produced by industry and academia over the last 25 years.
We find that articles published by industry teams tend to get greater attention, with a higher chance of being highly cited and citation-disruptive.
We find that academic-industry collaborations struggle to replicate the novelty of academic teams and tend to look similar to industry teams.
arXiv Detail & Related papers (2024-01-04T03:08:13Z) - Artificial intelligence in social science: A study based on
bibliometrics analysis [0.0]
This article presents the results of a bibliometric analysis of AI-related publications in the social sciences over the last ten years (2013-2022).
More than 19,408 articles have been published, 85% from 2008 to 2022, showing that research in this field is increasing significantly year on year.
The United States is the country that publishes the most (20%), followed by China (13%)
arXiv Detail & Related papers (2023-12-09T15:16:44Z) - Responsible AI Considerations in Text Summarization Research: A Review
of Current Practices [89.85174013619883]
We focus on text summarization, a common NLP task largely overlooked by the responsible AI community.
We conduct a multi-round qualitative analysis of 333 summarization papers from the ACL Anthology published between 2020-2022.
We focus on how, which, and when responsible AI issues are covered, which relevant stakeholders are considered, and mismatches between stated and realized research goals.
arXiv Detail & Related papers (2023-11-18T15:35:36Z) - Findings of the WMT 2023 Shared Task on Discourse-Level Literary
Translation: A Fresh Orb in the Cosmos of LLMs [80.05205710881789]
We release a copyrighted and document-level Chinese-English web novel corpus.
This year, we totally received 14 submissions from 7 academia and industry teams.
The official ranking of the systems is based on the overall human judgments.
arXiv Detail & Related papers (2023-11-06T14:23:49Z) - Analyzing the Impact of Companies on AI Research Based on Publications [1.450405446885067]
We compare academic- and company-authored AI publications published in the last decade.
We find that the citation count an individual publication receives is significantly higher when it is (co-authored) by a company.
arXiv Detail & Related papers (2023-10-31T13:27:04Z) - Artificial intelligence adoption in the physical sciences, natural
sciences, life sciences, social sciences and the arts and humanities: A
bibliometric analysis of research publications from 1960-2021 [73.06361680847708]
In 1960 14% of 333 research fields were related to AI, but this increased to over half of all research fields by 1972, over 80% by 1986 and over 98% in current times.
In 1960 14% of 333 research fields were related to AI (many in computer science), but this increased to over half of all research fields by 1972, over 80% by 1986 and over 98% in current times.
We conclude that the context of the current surge appears different, and that interdisciplinary AI application is likely to be sustained.
arXiv Detail & Related papers (2023-06-15T14:08:07Z) - The Elephant in the Room: Analyzing the Presence of Big Tech in Natural Language Processing Research [28.382353702576314]
We use a corpus with comprehensive metadata of 78,187 NLP publications and 701 resumes of NLP publication authors.
We find that industry presence among NLP authors has been steady before a steep increase over the past five years.
A few companies account for most of the publications and provide funding to academic researchers through grants and internships.
arXiv Detail & Related papers (2023-05-04T12:57:18Z) - Ethical Considerations and Statistical Analysis of Industry Involvement
in Machine Learning Research [3.4773470589069473]
We have examined all papers of the main ML conferences NeurIPS, CVPR, and ICML of the last 5 years.
Our statistical approach focuses on conflicts of interest, innovation and gender equality.
arXiv Detail & Related papers (2020-06-08T12:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.