Plausibility Vaccine: Injecting LLM Knowledge for Event Plausibility
- URL: http://arxiv.org/abs/2503.12667v1
- Date: Sun, 16 Mar 2025 21:55:17 GMT
- Title: Plausibility Vaccine: Injecting LLM Knowledge for Event Plausibility
- Authors: Jacob Chmura, Jonah Dauvet, Sebastian Sabry,
- Abstract summary: We train 12 task adapters to learn various physical properties and association measures.<n>We perform adapter fusion to compose latent semantic knowledge from each task on top of pre-trained AlBERT embeddings.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite advances in language modelling, distributional methods that build semantic representations from co-occurrences fail to discriminate between plausible and implausible events. In this work, we investigate how plausibility prediction can be improved by injecting latent knowledge prompted from large language models using parameter-efficient fine-tuning. We train 12 task adapters to learn various physical properties and association measures and perform adapter fusion to compose latent semantic knowledge from each task on top of pre-trained AlBERT embeddings. We automate auxiliary task data generation, which enables us to scale our approach and fine-tune our learned representations across two plausibility datasets. Our code is available at https://github.com/Jacob-Chmura/plausibility-vaccine.
Related papers
- What if Agents Could Imagine? Reinforcing Open-Vocabulary HOI Comprehension through Generation [35.62323084880028]
We propose textbfImagineAgent, an agentic framework that harmonizes cognitive reasoning with generative imagination for robust visual understanding.<n>Our method innovatively constructs cognitive maps that explicitly model plausible relationships between detected entities and candidate actions.<n>It dynamically invokes tools including retrieval augmentation, image cropping, and diffusion models to gather domain-specific knowledge and enriched visual evidence.
arXiv Detail & Related papers (2026-02-12T02:51:59Z) - Stable Diffusion Models are Secretly Good at Visual In-Context Learning [9.829303881652548]
We show that off-the-shelf Stable Diffusion models can be repurposed for visual in-context learning (V-ICL)<n>We formulate an in-place attention re-computation within the self-attention layers of the Stable Diffusion architecture.<n>We show that this repurposed Stable Diffusion model is able to adapt to six different tasks.
arXiv Detail & Related papers (2025-08-13T17:08:22Z) - Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Plausible-Parrots @ MSP2023: Enhancing Semantic Plausibility Modeling using Entity and Event Knowledge [1.6233244703352492]
We enhance the large language model (LLM) with fine-grained entity types, event types and their definitions extracted from an external knowledge base.
The experimental results show the effectiveness of the injected knowledge on modeling semantic plausibility of events.
arXiv Detail & Related papers (2024-08-29T23:13:45Z) - Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt
Learning with Data-Dependent Prior [14.232144691524528]
Recent Vision-Language Pretrained models have become the backbone for many downstream tasks.
MLE training can lead the context vector to over-fit dominant image features in the training data.
This paper presents a Bayesian-based framework of prompt learning, which could alleviate the overfitting issues on few-shot learning application.
arXiv Detail & Related papers (2024-01-09T10:15:59Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - Diagnosing and Rectifying Vision Models using Language [31.588965563961573]
Recent contrastive learning models have demonstrated the ability to learn an embedding space suitable for building strong vision classifiers.
Our work highlights a distinct advantage of this multi-modal embedding space: the ability to diagnose vision classifiers through natural language.
Our proposed method can discover high-error data slices, identify influential attributes and further rectify undesirable model behaviors.
arXiv Detail & Related papers (2023-02-08T18:59:42Z) - A Cohesive Distillation Architecture for Neural Language Models [0.0]
A recent trend in Natural Language Processing is the exponential growth in Language Model (LM) size.
This study investigates methods for Knowledge Distillation (KD) to provide efficient alternatives to large-scale models.
arXiv Detail & Related papers (2023-01-12T08:01:53Z) - Tyger: Task-Type-Generic Active Learning for Molecular Property
Prediction [121.97742787439546]
How to accurately predict the properties of molecules is an essential problem in AI-driven drug discovery.
To reduce annotation cost, deep Active Learning methods are developed to select only the most representative and informative data for annotating.
We propose a Task-type-generic active learning framework (termed Tyger) that is able to handle different types of learning tasks in a unified manner.
arXiv Detail & Related papers (2022-05-23T12:56:12Z) - BERT WEAVER: Using WEight AVERaging to enable lifelong learning for
transformer-based models in biomedical semantic search engines [49.75878234192369]
We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model.
We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once.
arXiv Detail & Related papers (2022-02-21T10:34:41Z) - Generative Conversational Networks [67.13144697969501]
We propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data.
We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data.
arXiv Detail & Related papers (2021-06-15T23:19:37Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.