Related papers: "We do not appreciate being experimented on": Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects

"We do not appreciate being experimented on": Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects

URL: http://arxiv.org/abs/2112.13217v2
Date: Fri, 2 Jun 2023 13:20:41 GMT
Title: "We do not appreciate being experimented on": Developer and Researcher Views on the Ethics of Experiments on Open-Source Projects
Authors: Dror G. Feitelson
Abstract summary: We conduct a survey among open source developers and empirical software engineering researchers to see what behaviors they think are acceptable. Results indicate that open-source developers are largely open to research, provided it is done transparently. It is recommended that open source repositories and projects address use for research in their access guidelines.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: A tenet of open source software development is to accept contributions from users-developers (typically after appropriate vetting). But should this also include interventions done as part of research on open source development? Following an incident in which buggy code was submitted to the Linux kernel to see whether it would be caught, we conduct a survey among open source developers and empirical software engineering researchers to see what behaviors they think are acceptable. This covers two main issues: the use of publicly accessible information, and conducting active experimentation. The survey had 224 respondents. The results indicate that open-source developers are largely open to research, provided it is done transparently. In other words, many would agree to experiments on open-source projects if the subjects were notified and provided informed consent, and in special cases also if only the project leaders agree. While researchers generally hold similar opinions, they sometimes fail to appreciate certain nuances that are important to developers. Examples include observing license restrictions on publishing open-source code and safeguarding the code. Conversely, researchers seem to be more concerned than developers about privacy issues. Based on these results, it is recommended that open source repositories and projects address use for research in their access guidelines, and that researchers take care to ask permission also when not formally required to do so. We note too that the open source community wants to be heard, so professional societies and IRBs should consult with them when formulating ethics codes.

Related papers

OpenThoughts: Data Recipes for Reasoning Models [215.16652796083164]
OpenThoughts project is to create open-source datasets for training reasoning models.<n>OpenThoughts2-1M dataset led to OpenThinker2-32B, the first model trained on public reasoning data.<n>OpenThoughts3-7B model, which achieves state-of-the-art results.
arXiv Detail & Related papers (2025-06-04T17:25:39Z)
OpenReview Should be Protected and Leveraged as a Community Asset for Research in the Era of Large Language Models [55.21589313404023]
OpenReview is a continually evolving repository of research papers, peer reviews, author rebuttals, meta-reviews, and decision outcomes.<n>We highlight three promising areas in which OpenReview can uniquely contribute: enhancing the quality, scalability, and accountability of peer review processes; enabling meaningful, open-ended benchmarks rooted in genuine expert deliberation; and supporting alignment research through real-world interactions reflecting expert assessment, intentions, and scientific values.<n>We suggest the community collaboratively explore standardized benchmarks and usage guidelines around OpenReview, inviting broader dialogue on responsible data use, ethical considerations, and collective stewardship.
arXiv Detail & Related papers (2025-05-24T09:07:13Z)
Empirical Analysis of Pull Requests for Google Summer of Code [0.0]
The Google Summer of Code (GSoC) is a global initiative that matches students or new contributors with experienced mentors to work on open-source projects. This study presents an empirical analysis of pull requests created by interns during the GSoC program.
arXiv Detail & Related papers (2024-12-17T17:42:43Z)
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs [151.79792315631965]
We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and citation-backed responses. On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model. OpenScholar's datastore, retriever, and self-feedback inference loop also improves off-the-shelf LMs.
arXiv Detail & Related papers (2024-11-21T15:07:42Z)
Discovery of Timeline and Crowd Reaction of Software Vulnerability Disclosures [47.435076500269545]
Apache Log4J was found to be vulnerable to remote code execution attacks. More than 35,000 packages were forced to update their Log4J libraries with the latest version. It is practically reasonable for software developers to update their third-party libraries whenever the software vendors have released a vulnerable-free version.
arXiv Detail & Related papers (2024-11-12T01:55:51Z)
The New Dynamics of Open Source: Relicensing, Forks, & Community Impact [0.0]
Vendors are relicensing popular open source projects to more restrictive licenses in the hopes of generating more revenue. This research compares organizational affiliation data from three case studies based on license changes that resulted in forks. Research indicates that the forks resulting from these relicensing events have more organizational diversity than the original projects.
arXiv Detail & Related papers (2024-11-07T14:21:45Z)
Understanding Code Understandability Improvements in Code Reviews [79.16476505761582]
We analyzed 2,401 code review comments from Java open-source projects on GitHub. 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted.
arXiv Detail & Related papers (2024-10-29T12:21:23Z)
On the modification and revocation of open source licences [0.14843690728081999]
This paper argues for the creation of a subset of rights that allows open source contributors to force users to update to the most recent version of a model. Legal, reputational and moral risks related to open-sourcing AI models could justify contributors having more control over downstream uses.
arXiv Detail & Related papers (2024-05-29T00:00:25Z)
What Can Natural Language Processing Do for Peer Review? [173.8912784451817]
In modern science, peer review is widely used, yet it is hard, time-consuming, and prone to error. Since the artifacts involved in peer review are largely text-based, Natural Language Processing has great potential to improve reviewing. We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance.
arXiv Detail & Related papers (2024-05-10T16:06:43Z)
On the Consideration of AI Openness: Can Good Intent Be Abused? [11.117214240906678]
We build a dataset consisting of 200 examples of questions and corresponding answers about criminal activities based on 200 Korean precedents. We find that a widely accepted open-source LLM can be easily tuned with EVE to provide unethical and informative answers about criminal activities. This implies that although open-source technologies contribute to scientific progress, some care must be taken to mitigate possible malicious use cases.
arXiv Detail & Related papers (2024-03-11T09:24:06Z)
How is Software Reuse Discussed in Stack Overflow? [12.586676749644342]
We present an empirical study of 1,409 posts to better understand the challenges developers face when reusing code. Our findings show that 'visual studio' is the top occurring bigrams for question posts, and there are frequent design patterns utilized by developers for the purpose of reuse.
arXiv Detail & Related papers (2023-11-01T03:13:36Z)
How do Software Engineering Researchers Use GitHub? An Empirical Study of Artifacts & Impact [0.2209921757303168]
We ask whether and how authors engage in social coding related to their research. Ten thousand papers in top SE research venues, hand-annotating their GitHub links, and studying 309 paper-related repositories. We find a wide distribution in popularity and impact, some strongly correlated with publication venue.
arXiv Detail & Related papers (2023-10-02T18:56:33Z)
Yes-Yes-Yes: Donation-based Peer Reviewing Data Collection for ACL Rolling Review and Beyond [58.71736531356398]
We present an in-depth discussion of peer reviewing data, outline the ethical and legal desiderata for peer reviewing data collection, and propose the first continuous, donation-based data collection workflow. We report on the ongoing implementation of this workflow at the ACL Rolling Review and deliver the first insights obtained with the newly collected data.
arXiv Detail & Related papers (2022-01-27T11:02:43Z)
Differentiable Open-Ended Commonsense Reasoning [80.94997942571838]
We study open-ended commonsense reasoning (OpenCSR) using as a resource only a corpus of commonsense facts written in natural language. As an approach to OpenCSR, we propose DrFact, an efficient Differentiable model for multi-hop Reasoning over knowledge Facts.
arXiv Detail & Related papers (2020-10-24T10:07:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.