Studying the characteristics of scientific communities using
individual-level bibliometrics: the case of Big Data research
- URL: http://arxiv.org/abs/2106.05581v1
- Date: Thu, 10 Jun 2021 08:17:09 GMT
- Title: Studying the characteristics of scientific communities using
individual-level bibliometrics: the case of Big Data research
- Authors: Xiaozan Lyu and Rodrigo Costas
- Abstract summary: We study the academic age, production, and research focus of the community of authors active in Big Data research.
Results show that the academic realm of "Big Data" is a growing topic with an expanding community of authors.
- Score: 2.208242292882514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unlike most bibliometric studies focusing on publications, taking Big Data
research as a case study, we introduce a novel bibliometric approach to unfold
the status of a given scientific community from an individual level
perspective. We study the academic age, production, and research focus of the
community of authors active in Big Data research. Artificial Intelligence (AI)
is selected as a reference area for comparative purposes. Results show that the
academic realm of "Big Data" is a growing topic with an expanding community of
authors, particularly of new authors every year. Compared to AI, Big Data
attracts authors with a longer academic age, who can be regarded to have
accumulated some publishing experience before entering the community. Despite
the highly skewed distribution of productivity amongst researchers in both
communities, Big Data authors have higher values of both research focus and
production than those of AI. Considering the community size, overall academic
age, and persistence of publishing on the topic, our results support the idea
of Big Data as a research topic with attractiveness for researchers. We argue
that the community-focused indicators proposed in this study could be
generalized to investigate the development and dynamics of other research
fields and topics.
Related papers
- Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is a large language model-powered research idea writing agent.
It generates problems, methods, and experiment designs while iteratively refining them based on scientific literature.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - Analyzing the Impact of Companies on AI Research Based on Publications [1.450405446885067]
We compare academic- and company-authored AI publications published in the last decade.
We find that the citation count an individual publication receives is significantly higher when it is (co-authored) by a company.
arXiv Detail & Related papers (2023-10-31T13:27:04Z) - A Comprehensive Study of Groundbreaking Machine Learning Research:
Analyzing highly cited and impactful publications across six decades [1.6442870218029522]
Machine learning (ML) has emerged as a prominent field of research in computer science and other related fields.
It is crucial to understand the landscape of highly cited publications to identify key trends, influential authors, and significant contributions made thus far.
arXiv Detail & Related papers (2023-08-01T21:43:22Z) - Tag-Aware Document Representation for Research Paper Recommendation [68.8204255655161]
We propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users.
The proposed model is effective in recommending research papers even when the rating data is very sparse.
arXiv Detail & Related papers (2022-09-08T09:13:07Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of
Computer Science Research [27.882505456528243]
DBLP is the largest open-access repository of scientific articles on computer science.
We retrieved more than 6 million publications from DBLP and extracted metadata.
D3 can be used to identify trends in research activity, productivity, focus, bias, accessibility, and impact of computer science research.
arXiv Detail & Related papers (2022-04-28T09:59:52Z) - Research Scholar Interest Mining Method based on Load Centrality [15.265191824669555]
This paper proposes a research scholar interest mining algorithm based on load centrality.
The regional structure of each topic can be used to closely calculate the weight of the centrality research model of the node.
The scientific research cooperation based on the load rate center proposed in this paper can effectively extract the interests of scientific research scholars.
arXiv Detail & Related papers (2022-03-21T04:16:46Z) - Industry and Academic Research in Computer Vision [5.634825161148484]
This work aims to study the dynamic between research in the industry and academia in computer vision.
The results are demonstrated on a set of top-5 vision conferences that are representative of the field.
arXiv Detail & Related papers (2021-07-10T20:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.