A Systematic Mapping Study of Crowd Knowledge Enhanced Software Engineering Research Using Stack Overflow
- URL: http://arxiv.org/abs/2408.07913v1
- Date: Thu, 15 Aug 2024 03:40:44 GMT
- Title: A Systematic Mapping Study of Crowd Knowledge Enhanced Software Engineering Research Using Stack Overflow
- Authors: Minaoar Tanzil, Shaiful Chowdhury, Somayeh Modaberi, Gias Uddin, Hadi Hemmati,
- Abstract summary: 30% of all software professionals visit the most popular Q&A site StackOverflow (SO) every day.
To find out the trend, implication, impact, and future research potential utilizing SO data, a systematic mapping study needs to be conducted.
We collected 384 SO-based research articles and categorized them into 10 facets (i.e., themes)
We found that SO contributes to 85% of SE research compared with popular Q&A sites such as Quora, and Reddit.
- Score: 0.8621608193534838
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developers continuously interact in crowd-sourced community-based question-answer (Q&A) sites. Reportedly, 30% of all software professionals visit the most popular Q&A site StackOverflow (SO) every day. Software engineering (SE) research studies are also increasingly using SO data. To find out the trend, implication, impact, and future research potential utilizing SO data, a systematic mapping study needs to be conducted. Following a rigorous reproducible mapping study approach, from 18 reputed SE journals and conferences, we collected 384 SO-based research articles and categorized them into 10 facets (i.e., themes). We found that SO contributes to 85% of SE research compared with popular Q&A sites such as Quora, and Reddit. We found that 18 SE domains directly benefited from SO data whereas Recommender Systems, and API Design and Evolution domains use SO data the most (15% and 16% of all SO-based research studies, respectively). API Design and Evolution, and Machine Learning with/for SE domains have consistent upward publication. Deep Learning Bug Analysis and Code Cloning research areas have the highest potential research impact recently. With the insights, recommendations, and facet-based categorized paper list from this mapping study, SE researchers can find potential research areas according to their interest to utilize large-scale SO data.
Related papers
- A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - Conflating point of interest (POI) data: A systematic review of matching
methods [5.439489511940086]
Point of interest (POI) data provide digital representations of places in the real world.
Many POI datasets have been developed, which often have different geographic coverages, attribute focuses, and data quality.
Researchers may need to conflate two or more POI datasets in order to build a better representation of the places in the study areas.
arXiv Detail & Related papers (2023-10-23T19:38:31Z) - The Quantum Frontier of Software Engineering: A Systematic Mapping Study [16.93115872272979]
Quantum software engineering (QSE) is emerging as a new discipline to enable developers to design and develop quantum programs.
This paper presents a systematic mapping study of the current state of QSE research.
arXiv Detail & Related papers (2023-05-31T09:26:10Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - A Comparative Study of Question Answering over Knowledge Bases [2.6135123648293717]
Question answering over knowledge bases (KBQA) has become a popular approach to help users extract information from knowledge bases.
We provide a comparative study of six representative KBQA systems on eight benchmark datasets.
We propose an advanced mapping algorithm to aid existing models in achieving superior results.
arXiv Detail & Related papers (2022-11-15T14:23:47Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - Studying the characteristics of scientific communities using
individual-level bibliometrics: the case of Big Data research [2.208242292882514]
We study the academic age, production, and research focus of the community of authors active in Big Data research.
Results show that the academic realm of "Big Data" is a growing topic with an expanding community of authors.
arXiv Detail & Related papers (2021-06-10T08:17:09Z) - A Survey of Knowledge Tracing: Models, Variants, and Applications [70.69281873057619]
Knowledge Tracing is one of the fundamental tasks for student behavioral data analysis.
We present three types of fundamental KT models with distinct technical routes.
We discuss potential directions for future research in this rapidly growing field.
arXiv Detail & Related papers (2021-05-06T13:05:55Z) - Domain Generalization: A Survey [146.68420112164577]
Domain generalization (DG) aims to achieve OOD generalization by only using source domain data for model learning.
For the first time, a comprehensive literature review is provided to summarize the ten-year development in DG.
arXiv Detail & Related papers (2021-03-03T16:12:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.