Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NER
- URL: http://arxiv.org/abs/2403.18025v2
- Date: Thu, 28 Mar 2024 11:01:21 GMT
- Title: Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NER
- Authors: Micheal Abaho, Danushka Bollegala, Gary Leeming, Dan Joyce, Iain E Buchan,
- Abstract summary: Mask Specific Language Modeling (MSLM) is an approach that efficiently acquires target domain knowledge.
MSLM jointly masks DS-terms and generic words, then learns mask-specific losses.
Results of our analysis show that MSLM improves LMs sensitivity and detection of DS-terms.
- Score: 21.560012335091287
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Adapting language models (LMs) to novel domains is often achieved through fine-tuning a pre-trained LM (PLM) on domain-specific data. Fine-tuning introduces new knowledge into an LM, enabling it to comprehend and efficiently perform a target domain task. Fine-tuning can however be inadvertently insensitive if it ignores the wide array of disparities (e.g in word meaning) between source and target domains. For instance, words such as chronic and pressure may be treated lightly in social conversations, however, clinically, these words are usually an expression of concern. To address insensitive fine-tuning, we propose Mask Specific Language Modeling (MSLM), an approach that efficiently acquires target domain knowledge by appropriately weighting the importance of domain-specific terms (DS-terms) during fine-tuning. MSLM jointly masks DS-terms and generic words, then learns mask-specific losses by ensuring LMs incur larger penalties for inaccurately predicting DS-terms compared to generic words. Results of our analysis show that MSLM improves LMs sensitivity and detection of DS-terms. We empirically show that an optimal masking rate not only depends on the LM, but also on the dataset and the length of sequences. Our proposed masking strategy outperforms advanced masking strategies such as span- and PMI-based masking.
Related papers
- Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? [83.53005932513155]
Multi-modal large language models (MLLMs) have made significant progress, yet their safety alignment remains limited.
We propose finetuning MLLMs on a small set of benign instruct-following data with responses replaced by simple, clear rejection sentences.
arXiv Detail & Related papers (2025-04-14T09:03:51Z) - Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks [50.53590930588431]
adversarial examples pose serious threats to natural language processing systems.
Recent studies suggest that adversarial texts deviate from the underlying manifold of normal texts, whereas masked language models can approximate the manifold of normal data.
We first introduce Masked Language Model-based Detection (MLMD), leveraging mask unmask operations of the masked language modeling (MLM) objective.
arXiv Detail & Related papers (2025-04-08T14:10:57Z) - Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation [36.41708236431343]
Large language models (LLMs) have been increasingly adopted for machine translation (MT)
Our work studies domain-adapted MT with LLMs through a careful prompting setup.
We find that demonstrations consistently outperform terminology, and retrieval consistently outperforms generation.
arXiv Detail & Related papers (2025-03-06T22:23:07Z) - Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs [56.74916151916208]
Large language models (LLMs) exhibit hallucinations (i.e., unfaithful or nonsensical information) when serving as AI assistants in various domains.
Previous factuality alignment methods that conduct response-level preference learning inevitably introduced noises during training.
This paper proposes a fine-grained factuality alignment method based on Direct Preference Optimization (DPO), called Mask-DPO.
arXiv Detail & Related papers (2025-03-04T18:20:24Z) - Refining Translations with LLMs: A Constraint-Aware Iterative Prompting Approach [7.5069214839655345]
Large language models (LLMs) have demonstrated remarkable proficiency in machine translation (MT)
We propose a multi-step prompt chain that enhances translation faithfulness by prioritizing key terms crucial for semantic accuracy.
Experiments using Llama and Qwen as base models on the FLORES-200 and WMT datasets demonstrate significant improvements over baselines.
arXiv Detail & Related papers (2024-11-13T05:40:24Z) - SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack [29.744970741737376]
We propose a novel framework called Semantic-Consistent Unrestricted Adversarial Attacks (SCA)
SCA employs an inversion method to extract edit-friendly noise maps and utilizes Multimodal Large Language Model (MLLM) to provide semantic guidance.
Our framework enables the efficient generation of adversarial examples that exhibit minimal discernible semantic changes.
arXiv Detail & Related papers (2024-10-03T06:25:53Z) - Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.
We devise a series of experiments to empirically explain the performance gap.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance [51.36243421001282]
Gradient-Mask Tuning (GMT) is a method that selectively updates parameters during training based on their gradient information.
Our empirical results across various tasks demonstrate that GMT not only outperforms traditional fine-tuning methods but also elevates the upper limits of LLM performance.
arXiv Detail & Related papers (2024-06-21T17:42:52Z) - The Strong Pull of Prior Knowledge in Large Language Models and Its Impact on Emotion Recognition [74.04775677110179]
In-context Learning (ICL) has emerged as a powerful paradigm for performing natural language tasks with Large Language Models (LLM)
We show that LLMs have strong yet inconsistent priors in emotion recognition that ossify their predictions.
Our results suggest that caution is needed when using ICL with larger LLMs for affect-centered tasks outside their pre-training domain.
arXiv Detail & Related papers (2024-03-25T19:07:32Z) - Fine-tuning Large Language Models for Domain-specific Machine
Translation [8.439661191792897]
Large language models (LLMs) have made significant progress in machine translation (MT)
However, their potential in domain-specific MT remains under-explored.
This paper proposes a prompt-oriented fine-tuning method, denoted as LlamaIT, to effectively and efficiently fine-tune a general-purpose LLM for domain-specific MT tasks.
arXiv Detail & Related papers (2024-02-23T02:24:15Z) - Mixture of Soft Prompts for Controllable Data Generation [21.84489422361048]
Mixture of Soft Prompts (MSP) is proposed as a tool for data augmentation rather than direct prediction.
Our method achieves state-of-the-art results on three benchmarks when compared against strong baselines.
arXiv Detail & Related papers (2023-03-02T21:13:56Z) - Generalizing through Forgetting -- Domain Generalization for Symptom
Event Extraction in Clinical Notes [0.0]
We present domain generalization for symptom extraction using pretraining and fine-tuning data.
We propose a domain generalization method that dynamically masks frequent symptoms words in the source domain.
Our experiments indicate that masking and adaptive pretraining methods can significantly improve performance when the source domain is more distant from the target domain.
arXiv Detail & Related papers (2022-09-20T05:53:22Z) - KALA: Knowledge-Augmented Language Model Adaptation [65.92457495576141]
We propose a novel domain adaption framework for pre-trained language models (PLMs)
Knowledge-Augmented Language model Adaptation (KALA) modulates the intermediate hidden representations of PLMs with domain knowledge.
Results show that, despite being computationally efficient, our KALA largely outperforms adaptive pre-training.
arXiv Detail & Related papers (2022-04-22T08:11:59Z) - Context-Aware Mixup for Domain Adaptive Semantic Segmentation [52.1935168534351]
Unsupervised domain adaptation (UDA) aims to adapt a model of the labeled source domain to an unlabeled target domain.
We propose end-to-end Context-Aware Mixup (CAMix) for domain adaptive semantic segmentation.
Experimental results show that the proposed method outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-08-08T03:00:22Z) - Effective Unsupervised Domain Adaptation with Adversarially Trained
Language Models [54.569004548170824]
We show that careful masking strategies can bridge the knowledge gap of masked language models.
We propose an effective training strategy by adversarially masking out those tokens which are harder to adversarial by the underlying.
arXiv Detail & Related papers (2020-10-05T01:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.