Related papers: Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods

Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods

URL: http://arxiv.org/abs/2505.17870v1
Date: Fri, 23 May 2025 13:20:23 GMT
Title: Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods
Authors: Shaina Raza, Rizwan Qureshi, Marcelo Lotif, Aman Chadha, Deval Pandya, Christos Emmanouilidis,
Abstract summary: Generative AI models often learn and reproduce false information present in their training corpora.<n>This paper argues that AI models should be fine tuned on small, quarantined sets of explicitly labeled falsehoods as a "vaccine" against misinformation.
Score: 4.6697477379475005
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative AI models often learn and reproduce false information present in their training corpora. This position paper argues that, analogous to biological immunization, where controlled exposure to a weakened pathogen builds immunity, AI models should be fine tuned on small, quarantined sets of explicitly labeled falsehoods as a "vaccine" against misinformation. These curated false examples are periodically injected during finetuning, strengthening the model ability to recognize and reject misleading claims while preserving accuracy on truthful inputs. An illustrative case study shows that immunized models generate substantially less misinformation than baselines. To our knowledge, this is the first training framework that treats fact checked falsehoods themselves as a supervised vaccine, rather than relying on input perturbations or generic human feedback signals, to harden models against future misinformation. We also outline ethical safeguards and governance controls to ensure the safe use of false data. Model immunization offers a proactive paradigm for aligning AI systems with factuality.

Related papers

Enhancing Vaccine Safety Surveillance: Extracting Vaccine Mentions from Emergency Department Triage Notes Using Fine-Tuned Large Language Models [0.5025737475817937]
The performance of prompt-engineered models, fine-tuned models, and a rule-based approach was compared.<n>The fine-tuned Llama 3 billion parameter model outperformed other models in its accuracy of extracting vaccine names.
arXiv Detail & Related papers (2025-07-10T09:57:08Z)
Model Immunization from a Condition Number Perspective [14.84123611635938]
We propose a framework, based on the condition number of a Hessian matrix, to analyze model immunization for linear models.<n>We design an algorithm with regularization terms to control the resulting condition numbers after pre-training.<n> Empirical results on linear models and non-linear deep-nets demonstrate the effectiveness of the proposed algorithm.
arXiv Detail & Related papers (2025-05-29T17:59:48Z)
Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection [6.949493332885247]
We introduce VenusVaccine, a novel deep learning solution for predicting immunogenicity in vaccines.<n>We also compile the most comprehensive immunogenicity dataset to date, encompassing over 7000 antigen sequences, structures, and immunogenicity labels from bacteria, virus, and tumor.<n>Our work provides an effective tool for vaccine design and sets valuable benchmarks for future research.
arXiv Detail & Related papers (2024-10-03T16:33:35Z)
Missci: Reconstructing Fallacies in Misrepresented Science [84.32990746227385]
Health-related misinformation on social networks can lead to poor decision-making and real-world dangers. Missci is a novel argumentation theoretical model for fallacious reasoning. We present Missci as a dataset to test the critical reasoning abilities of large language models.
arXiv Detail & Related papers (2024-06-05T12:11:10Z)
Nutrition Facts, Drug Facts, and Model Facts: Putting AI Ethics into Practice in Gun Violence Research [0.0]
We propose a Model Facts template that is easily extendable and decomposes accuracy and demographics into standardized and minimally complex values. We apply the Model Facts template on two previously published models, a violence risk identification model and a suicide risk prediction model.
arXiv Detail & Related papers (2024-02-14T16:19:09Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)
Raising the Cost of Malicious AI-Powered Image Editing [82.71990330465115]
We present an approach to mitigating the risks of malicious image editing posed by large diffusion models. The key idea is to immunize images so as to make them resistant to manipulation by these models.
arXiv Detail & Related papers (2023-02-13T18:38:42Z)
Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification [60.49594822215981]
This paper presents a classification model for detecting COVID-19 vaccination related search queries. We propose a novel approach of considering dense features as memory tokens that the model can attend to. We show that this new modeling approach enables a significant improvement to the Vaccine Search Insights (VSI) task.
arXiv Detail & Related papers (2022-12-16T13:57:41Z)
MOVE: Effective and Harmless Ownership Verification via Embedded External Features [104.97541464349581]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.<n>We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.<n>We then train a meta-classifier to determine whether a model is stolen from the victim.
arXiv Detail & Related papers (2022-08-04T02:22:29Z)
Disentangled Learning of Stance and Aspect Topics for Vaccine Attitude Detection in Social Media [40.61499595293957]
We propose a novel semi-supervised approach for vaccine attitude detection, called VADet. VADet is able to learn disentangled stance and aspect topics, and outperforms existing aspect-based sentiment analysis models on both stance detection and tweet clustering.
arXiv Detail & Related papers (2022-05-06T15:24:33Z)
Amnesiac Machine Learning [15.680008735220785]
Recently enacted General Data Protection Regulation affects any data holder that has data on European Union residents. Models are vulnerable to information leaking attacks such as model inversion attacks. We present two data removal methods, namely Unlearning and Amnesiac Unlearning, that enable model owners to protect themselves against such attacks while being compliant with regulations.
arXiv Detail & Related papers (2020-10-21T13:14:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.