Ethical Classification of Non-Coding Contributions in Open-Source Projects via Large Language Models
- URL: http://arxiv.org/abs/2507.21583v1
- Date: Tue, 29 Jul 2025 08:34:46 GMT
- Title: Ethical Classification of Non-Coding Contributions in Open-Source Projects via Large Language Models
- Authors: Sergio Cobos, Javier Luis Cánovas Izquierdo,
- Abstract summary: We propose an approach to classify the ethical quality of non-coding contributions in OSS projects by relying on Large Language Models (LLM)<n>We defined a set of ethical metrics based on the Contributor Covenant and developed a classification approach to assess ethical behavior in OSS non-coding contributions.
- Score: 0.3222802562733786
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The development of Open-Source Software (OSS) is not only a technical challenge, but also a social one due to the diverse mixture of contributors. To this aim, social-coding platforms, such as GitHub, provide the infrastructure needed to host and develop the code, but also the support for enabling the community's collaboration, which is driven by non-coding contributions, such as issues (i.e., change proposals or bug reports) or comments to existing contributions. As with any other social endeavor, this development process faces ethical challenges, which may put at risk the project's sustainability. To foster a productive and positive environment, OSS projects are increasingly deploying codes of conduct, which define rules to ensure a respectful and inclusive participatory environment, with the Contributor Covenant being the main model to follow. However, monitoring and enforcing these codes of conduct is a challenging task, due to the limitations of current approaches. In this paper, we propose an approach to classify the ethical quality of non-coding contributions in OSS projects by relying on Large Language Models (LLM), a promising technology for text classification tasks. We defined a set of ethical metrics based on the Contributor Covenant and developed a classification approach to assess ethical behavior in OSS non-coding contributions, using prompt engineering to guide the model's output.
Related papers
- Charting Uncertain Waters: A Socio-Technical Framework for Navigating GenAI's Impact on Open Source Communities [53.812795099349295]
We conduct a scenario-driven, conceptual exploration using a socio-technical framework inspired by McLuhan's Tetrad to surface both risks and opportunities for community resilience amid GenAI-driven disruption of OSS development across four domains: software practices, documentation, community engagement, and governance.<n>By adopting this lens, OSS leaders and researchers can proactively shape the future of their ecosystems, rather than simply reacting to technological upheaval.
arXiv Detail & Related papers (2025-08-06T22:54:15Z) - MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks [56.34018316319873]
We propose MERA Code, a benchmark for evaluating code for the latest code generation LLMs in Russian.<n>This benchmark includes 11 evaluation tasks that span 8 programming languages.<n>We evaluate open LLMs and frontier API models, analyzing their limitations in terms of practical coding tasks in non-English languages.
arXiv Detail & Related papers (2025-07-16T14:31:33Z) - Are We on the Same Page? Examining Developer Perception Alignment in Open Source Code Reviews [2.66269503676104]
Code reviews are a critical aspect of open-source software (OSS) development, ensuring quality and fostering collaboration.<n>This study examines perceptions, challenges, and biases in OSS code review processes, focusing on the perspectives of Contributors and maintainers.
arXiv Detail & Related papers (2025-04-25T15:03:39Z) - A Bot-based Approach to Manage Codes of Conduct in Open-Source Projects [0.3222802562733786]
We propose an approach to effectively manage codes of conduct in OSS projects based on the Contributor Covenant proposal.<n>Our solution has been implemented as a bot-based solution where bots help in the definition of codes of conduct, the monitoring of OSS projects, and the enforcement of ethical rules.
arXiv Detail & Related papers (2025-03-07T14:50:02Z) - CROSS: A Contributor-Project Interaction Lifecycle Model for Open Source Software [2.9631016562930546]
Cross model is a novel contributor-project interaction lifecycle model for open source software.
It explains a range of archetypal cases of contributor engagement and highlights research gaps, especially in EoS/offboarding scenarios.
arXiv Detail & Related papers (2024-09-12T17:57:12Z) - How to Gain Commit Rights in Modern Top Open Source Communities? [14.72524623433377]
We study the policies and practical implementations of committer qualifications in modern top OSS communities.
We construct a taxonomy of committer qualifications, consisting of 26 codes categorized into nine themes.
We find that the probability of gaining commit rights decreases as participation time passes.
arXiv Detail & Related papers (2024-05-03T01:23:06Z) - Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models [51.69735366140249]
We introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools.<n>Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions.<n>Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models.
arXiv Detail & Related papers (2024-04-18T11:38:25Z) - Towards Responsible AI in Banking: Addressing Bias for Fair
Decision-Making [69.44075077934914]
"Responsible AI" emphasizes the critical nature of addressing biases within the development of a corporate culture.
This thesis is structured around three fundamental pillars: understanding bias, mitigating bias, and accounting for bias.
In line with open-source principles, we have released Bias On Demand and FairView as accessible Python packages.
arXiv Detail & Related papers (2024-01-13T14:07:09Z) - Semantic Communications for Artificial Intelligence Generated Content
(AIGC) Toward Effective Content Creation [75.73229320559996]
This paper develops a conceptual model for the integration of AIGC and SemCom.
A novel framework that employs AIGC technology is proposed as an encoder and decoder for semantic information.
The framework can adapt to different types of content generated, the required quality, and the semantic information utilized.
arXiv Detail & Related papers (2023-08-09T13:17:21Z) - Ethical Considerations and Policy Implications for Large Language
Models: Guiding Responsible Development and Deployment [48.72819550642584]
This paper examines the ethical considerations and implications of large language models (LLMs) in generating content.
It highlights the potential for both positive and negative uses of generative AI programs and explores the challenges in assigning responsibility for their outputs.
arXiv Detail & Related papers (2023-08-01T07:21:25Z) - Applying Standards to Advance Upstream & Downstream Ethics in Large
Language Models [0.0]
This paper explores how AI-owners can develop safeguards for AI-generated content.
It draws from established codes of conduct and ethical standards in other content-creation industries.
arXiv Detail & Related papers (2023-06-06T08:47:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.