Related papers: Malicious Repurposing of Open Science Artefacts by Using Large Language Models

Malicious Repurposing of Open Science Artefacts by Using Large Language Models

URL: http://arxiv.org/abs/2601.18998v1
Date: Mon, 26 Jan 2026 22:20:24 GMT
Title: Malicious Repurposing of Open Science Artefacts by Using Large Language Models
Authors: Zahra Hashemi, Zhiqiang Zhong, Jun Pang, Wei Zhao,
Abstract summary: We show that large language models can generate harmful proposals by repurposing ethically designed open artefacts.<n>Overall, our findings demonstrate that LLMs can generate harmful proposals by repurposing ethically designed open artefacts.<n>However, we find that LLMs acting as evaluators strongly disagree with one another on evaluation outcomes.
Score: 15.553448471983783
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The rapid evolution of large language models (LLMs) has fuelled enthusiasm about their role in advancing scientific discovery, with studies exploring LLMs that autonomously generate and evaluate novel research ideas. However, little attention has been given to the possibility that such models could be exploited to produce harmful research by repurposing open science artefacts for malicious ends. We fill the gap by introducing an end-to-end pipeline that first bypasses LLM safeguards through persuasion-based jailbreaking, then reinterprets NLP papers to identify and repurpose their artefacts (datasets, methods, and tools) by exploiting their vulnerabilities, and finally assesses the safety of these proposals using our evaluation framework across three dimensions: harmfulness, feasibility of misuse, and soundness of technicality. Overall, our findings demonstrate that LLMs can generate harmful proposals by repurposing ethically designed open artefacts; however, we find that LLMs acting as evaluators strongly disagree with one another on evaluation outcomes: GPT-4.1 assigns higher scores (indicating greater potential harms, higher soundness and feasibility of misuse), Gemini-2.5-pro is markedly stricter, and Grok-3 falls between these extremes. This indicates that LLMs cannot yet serve as reliable judges in a malicious evaluation setup, making human evaluation essential for credible dual-use risk assessment.

Related papers

Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask [30.819697001992154]
Large Language Models are a promising tool for automated vulnerability detection.<n>Despite widespread adoption, a critical question remains: Are LLMs truly effective at detecting real-world vulnerabilities?<n>This paper challenges three widely held community beliefs: that LLMs are (i) unreliable, (ii) insensitive to code patches, and (iii) performance-plateaued across model scales.
arXiv Detail & Related papers (2025-04-18T05:32:47Z)
LLM-Safety Evaluations Lack Robustness [58.334290876531036]
We argue that current safety alignment research efforts for large language models are hindered by many intertwined sources of noise.<n>We propose a set of guidelines for reducing noise and bias in evaluations of future attack and defense papers.
arXiv Detail & Related papers (2025-03-04T12:55:07Z)
On Evaluating the Durability of Safeguards for Open-Weight LLMs [80.36750298080275]
We discuss whether technical safeguards can impede the misuse of large language models (LLMs)<n>We show that even evaluating these defenses is exceedingly difficult and can easily mislead audiences into thinking that safeguards are more durable than they really are.<n>We suggest future research carefully cabin claims to more constrained, well-defined, and rigorously examined threat models.
arXiv Detail & Related papers (2024-12-10T01:30:32Z)
Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.<n>The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.<n>We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z)
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis [78.07225438556203]
We introduce LLM-Oasis, the largest resource for training end-to-end factuality evaluators.<n>It is constructed by extracting claims from Wikipedia, falsifying a subset of these claims, and generating pairs of factual and unfactual texts.<n>We then rely on human annotators to both validate the quality of our dataset and to create a gold standard test set for factuality evaluation systems.
arXiv Detail & Related papers (2024-11-29T12:21:15Z)
Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks [23.782566331783134]
We focus on 10 cutting-edge jailbreak strategies across three categories, 1525 questions from 61 specific harmful categories, and 13 popular LLMs. We adopt multi-dimensional metrics such as Attack Success Rate (ASR), Toxicity Score, Fluency, Token Length, and Grammatical Errors to thoroughly assess the LLMs' outputs under jailbreak. We explore the relationships among the models, attack strategies, and types of harmful content, as well as the correlations between the evaluation metrics, which proves the validity of our multifaceted evaluation framework.
arXiv Detail & Related papers (2024-08-18T01:58:03Z)
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal [64.9938658716425]
SORRY-Bench is a proposed benchmark for evaluating large language models' (LLMs) ability to recognize and reject unsafe user requests.<n>First, existing methods often use coarse-grained taxonomy of unsafe topics, and are over-representing some fine-grained topics.<n>Second, linguistic characteristics and formatting of prompts are often overlooked, like different languages, dialects, and more -- which are only implicitly considered in many evaluations.
arXiv Detail & Related papers (2024-06-20T17:56:07Z)
Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning [61.2224355547598]
Open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. Our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions.
arXiv Detail & Related papers (2024-04-16T13:22:54Z)
FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity [20.510512358961517]
The widespread of generative artificial intelligence has heightened concerns about the potential harms posed by AI-generated texts.<n>Previous researchers have invested much effort in assessing the harmlessness of generative language models.
arXiv Detail & Related papers (2023-11-30T14:18:47Z)
Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models [15.735715641327836]
We study the risk assessment of Large Language Models (LLMs) from the lens of uncertainty.<n>Our findings validate the effectiveness of uncertainty estimation for revealing LLMs' uncertain/non-factual predictions.<n>Insights from our study shed light on future design and development for reliable LLMs.
arXiv Detail & Related papers (2023-07-16T08:28:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.