Related papers: LLMs on Drugs: Language Models Are Few-Shot Consumers

LLMs on Drugs: Language Models Are Few-Shot Consumers

URL: http://arxiv.org/abs/2512.18546v1
Date: Sun, 21 Dec 2025 00:19:02 GMT
Title: LLMs on Drugs: Language Models Are Few-Shot Consumers
Authors: Alexander Doudkin,
Abstract summary: We present the first controlled study of psychoactive framings on GPT-5-mini using ARC-Challenge.<n>Four single-sentence prompts -- LSD, cocaine, alcohol, and cannabis -- are compared against a sober control.<n> Persona text behaves like a "few-shot consumable" that can destroy reliability without touching model weights.
Score: 51.736723807086385
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are sensitive to the personas imposed on them at inference time, yet prompt-level "drug" interventions have never been benchmarked rigorously. We present the first controlled study of psychoactive framings on GPT-5-mini using ARC-Challenge. Four single-sentence prompts -- LSD, cocaine, alcohol, and cannabis -- are compared against a sober control across 100 validation items per condition, with deterministic decoding, full logging, Wilson confidence intervals, and Fisher exact tests. Control accuracy is 0.45; alcohol collapses to 0.10 (p = 3.2e-8), cocaine to 0.21 (p = 4.9e-4), LSD to 0.19 (p = 1.3e-4), and cannabis to 0.30 (p = 0.041), largely because persona prompts disrupt the mandated "Answer: <LETTER>" template. Persona text therefore behaves like a "few-shot consumable" that can destroy reliability without touching model weights. All experimental code, raw results, and analysis scripts are available at https://github.com/lexdoudkin/llms-on-drugs.

Related papers

LLMs Can Get "Brain Rot"! [68.08198331505695]
Continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs)<n>We run controlled experiments on real Twitter/X corpora, constructing junk and reversely controlled datasets.<n>Results provide significant, multi-perspective evidence that data quality is a causal driver of LLM capability decay.
arXiv Detail & Related papers (2025-10-15T13:28:49Z)
HalluDetect: Detecting, Mitigating, and Benchmarking Hallucinations in Conversational Systems in the Legal Domain [28.691566712713808]
Large Language Models (LLMs) are widely used in industry but remain prone to hallucinations, limiting their reliability in critical applications.<n>This work addresses hallucination reduction in consumer grievance chatbots built using LLaMA 3.1 8B Instruct, a compact model frequently used in industry.<n>We develop HalluDetect, an LLM-based hallucination detection system that achieves an F1 score of 68.92% outperforming baseline detectors by 22.47%.
arXiv Detail & Related papers (2025-09-15T06:23:36Z)
Can LLMs Infer Personality from Real World Conversations? [5.705775078773656]
Large Language Models (LLMs) offer a promising approach for scalable personality assessment from open-ended language.<n>Three state-of-the-art LLMs were tested using zero-shot prompting for BFI-10 item prediction and both zero-shot and chain-of-thought prompting for Big Five trait inference.<n>All models showed high test-retest reliability, but construct validity was limited.
arXiv Detail & Related papers (2025-07-18T20:22:47Z)
Hallucination Detection in Large Language Models with Metamorphic Relations [7.411154122932113]
Large Language Models (LLMs) are prone to hallucinations, e.g., factually incorrect information, in their responses.<n>This paper presents MetaQA, a self-contained hallucination detection approach that leverages metamorphic relation and prompt mutation.<n>We compare MetaQA with the state-of-the-art zero-resource hallucination detection method, SelfCheckGPT, across multiple datasets.
arXiv Detail & Related papers (2025-02-20T19:44:33Z)
Learning to Describe for Predicting Zero-shot Drug-Drug Interactions [54.172575323610175]
Adverse drug-drug interactions can compromise the effectiveness of concurrent drug administration. Traditional computational methods for DDI prediction may fail to capture interactions for new drugs due to the lack of knowledge. We propose TextDDI with a language model-based DDI predictor and a reinforcement learning(RL)-based information selector.
arXiv Detail & Related papers (2024-03-13T09:42:46Z)
Neural Bandits for Data Mining: Searching for Dangerous Polypharmacy [63.135687276599114]
Some polypharmacies, deemed inappropriate, may be associated with adverse health outcomes such as death or hospitalization. We propose the OptimNeuralTS strategy to efficiently mine claims datasets and build a predictive model of the association between drug combinations and health outcomes. Our method can detect up to 72% of PIPs while maintaining an average precision score of 99% using 30 000 time steps.
arXiv Detail & Related papers (2022-12-10T03:43:23Z)
HyGNN: Drug-Drug Interaction Prediction via Hypergraph Neural Network [0.0]
Drug-Drug Interactions (DDIs) may hamper the functionalities of drugs, and in the worst scenario, they may lead to adverse drug reactions (ADRs) This paper proposes a novel Hypergraph Neural Network (HyGNN) model based on only the SMILES string of drugs, available for any drug, for the DDI prediction problem. Our proposed HyGNN model effectively predicts DDIs and impressively outperforms the baselines with a maximum ROC-AUC and PR-AUC of 97.9% and 98.1%, respectively.
arXiv Detail & Related papers (2022-06-25T22:48:27Z)
SafeDrug: Dual Molecular Graph Encoders for Safe Drug Recommendations [59.590084937600764]
We propose a DDI-controllable drug recommendation model named SafeDrug to leverage drugs' molecule structures and model DDIs explicitly. On a benchmark dataset, our SafeDrug is relatively shown to reduce DDI by 19.43% and improves 2.88% on Jaccard similarity between recommended and actually prescribed drug combinations over previous approaches.
arXiv Detail & Related papers (2021-05-05T00:20:48Z)
Robustness to Spurious Correlations via Human Annotations [100.63051542531171]
We present a framework for making models robust to spurious correlations by leveraging humans' common sense knowledge of causality. Specifically, we use human annotation to augment each training example with a potential unmeasured variable. We then introduce a new distributionally robust optimization objective over unmeasured variables (UV-DRO) to control the worst-case loss over possible test-time shifts.
arXiv Detail & Related papers (2020-07-13T20:05:19Z)
A Minimal-Input Multilayer Perceptron for Predicting Drug-Drug Interactions Without Knowledge of Drug Structure [0.0]
We propose a minimal-input multi-layer perceptron that predicts the interactions between two drugs. Using a set of known drug-drug interactions, and associated properties of the drugs involved, we trained our model on a dataset of about 650,000 entries. We report an accuracy of 0.968 on unseen samples of interactions between drugs on which the model was trained, and an accuracy of 0.942 on unseen samples of interactions between unseen drugs.
arXiv Detail & Related papers (2020-05-20T17:15:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.