Development of Application-Specific Large Language Models to Facilitate Research Ethics Review
- URL: http://arxiv.org/abs/2501.10741v2
- Date: Tue, 18 Feb 2025 08:48:25 GMT
- Title: Development of Application-Specific Large Language Models to Facilitate Research Ethics Review
- Authors: Sebastian Porsdam Mann, Joel Seah Jiehao, Stephen R. Latham, Julian Savulescu, Mateo Aboy, Brian D. Earp,
- Abstract summary: We propose application-specific large language models (LLMs) to facilitate IRB review processes.
These IRB-specific LLMs would be fine-tuned on IRB-specific literature and institutional datasets.
We outline potential applications, including pre-review screening, preliminary analysis, consistency checking, and decision support.
- Score: 0.0
- License:
- Abstract: Institutional review boards (IRBs) play a crucial role in ensuring the ethical conduct of human subjects research, but face challenges including inconsistency, delays, and inefficiencies. We propose the development and implementation of application-specific large language models (LLMs) to facilitate IRB review processes. These IRB-specific LLMs would be fine-tuned on IRB-specific literature and institutional datasets, and equipped with retrieval capabilities to access up-to-date, context-relevant information. We outline potential applications, including pre-review screening, preliminary analysis, consistency checking, and decision support. While addressing concerns about accuracy, context sensitivity, and human oversight, we acknowledge remaining challenges such as over-reliance on AI and the need for transparency. By enhancing the efficiency and quality of ethical review while maintaining human judgment in critical decisions, IRB-specific LLMs offer a promising tool to improve research oversight. We call for pilot studies to evaluate the feasibility and impact of this approach.
Related papers
- ReviewEval: An Evaluation Framework for AI-Generated Reviews [9.35023998408983]
This research introduces a comprehensive evaluation framework for AI-generated reviews.
It measures alignment with human evaluations, verifies factual accuracy, assesses analytical depth, and identifies actionable insights.
Our framework establishes standardized metrics for evaluating AI-based review systems.
arXiv Detail & Related papers (2025-02-17T12:22:11Z) - Human services organizations and the responsible integration of AI: Considering ethics and contextualizing risk(s) [0.0]
Authors argue that ethical concerns about AI deployment vary significantly based on implementation context and specific use cases.
They propose a dimensional risk assessment approach that considers factors like data sensitivity, professional oversight requirements, and potential impact on client wellbeing.
arXiv Detail & Related papers (2025-01-20T19:38:21Z) - Enabling Scalable Oversight via Self-Evolving Critic [59.861013614500024]
SCRIT (Self-evolving CRITic) is a framework that enables genuine self-evolution of critique abilities.
It self-improves by training on synthetic data, generated by a contrastive-based self-critic.
It achieves up to a 10.3% improvement on critique-correction and error identification benchmarks.
arXiv Detail & Related papers (2025-01-10T05:51:52Z) - A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications [52.42860559005861]
Direct Preference Optimization (DPO) has emerged as a promising approach for alignment.
Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature.
arXiv Detail & Related papers (2024-10-21T02:27:24Z) - An evidence-based methodology for human rights impact assessment (HRIA) in the development of AI data-intensive systems [49.1574468325115]
We show that human rights already underpin the decisions in the field of data use.
This work presents a methodology and a model for a Human Rights Impact Assessment (HRIA)
The proposed methodology is tested in concrete case-studies to prove its feasibility and effectiveness.
arXiv Detail & Related papers (2024-07-30T16:27:52Z) - Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.
In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.
We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.
These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z) - Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence [5.147767778946168]
We critically assess 23 state-of-the-art Large Language Models (LLMs) benchmarks.
Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, diversity, and the overlooking of cultural and ideological norms.
arXiv Detail & Related papers (2024-02-15T11:08:10Z) - Word-Level ASR Quality Estimation for Efficient Corpus Sampling and
Post-Editing through Analyzing Attentions of a Reference-Free Metric [5.592917884093537]
The potential of quality estimation (QE) metrics is introduced and evaluated as a novel tool to enhance explainable artificial intelligence (XAI) in ASR systems.
The capabilities of the NoRefER metric are explored in identifying word-level errors to aid post-editors in refining ASR hypotheses.
arXiv Detail & Related papers (2024-01-20T16:48:55Z) - Perspectives on Large Language Models for Relevance Judgment [56.935731584323996]
Large language models (LLMs) claim that they can assist with relevance judgments.
It is not clear whether automated judgments can reliably be used in evaluations of retrieval systems.
arXiv Detail & Related papers (2023-04-13T13:08:38Z) - Methodological reflections for AI alignment research using human
feedback [0.0]
AI alignment aims to investigate whether AI technologies align with human interests and values and function in a safe and ethical manner.
LLMs have the potential to exhibit unintended behavior due to their ability to learn and adapt in ways that are difficult to predict.
arXiv Detail & Related papers (2022-12-22T14:27:33Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.