Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse
Finetuning
- URL: http://arxiv.org/abs/2311.03748v1
- Date: Tue, 7 Nov 2023 06:19:37 GMT
- Title: Unified Low-Resource Sequence Labeling by Sample-Aware Dynamic Sparse
Finetuning
- Authors: Sarkar Snigdha Sarathi Das, Ranran Haoran Zhang, Peng Shi, Wenpeng
Yin, Rui Zhang
- Abstract summary: FISH-DIP is a sample-aware dynamic sparse finetuning strategy that selectively focuses on a fraction of parameters.
We demonstrate that FISH-DIP can smoothly optimize the model in low resource settings offering upto 40% performance improvements.
- Score: 24.765911297156855
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Unified Sequence Labeling that articulates different sequence labeling
problems such as Named Entity Recognition, Relation Extraction, Semantic Role
Labeling, etc. in a generalized sequence-to-sequence format opens up the
opportunity to make the maximum utilization of large language model knowledge
toward structured prediction. Unfortunately, this requires formatting them into
specialized augmented format unknown to the base pretrained language model
(PLMs) necessitating finetuning to the target format. This significantly bounds
its usefulness in data-limited settings where finetuning large models cannot
properly generalize to the target format. To address this challenge and
leverage PLM knowledge effectively, we propose FISH-DIP, a sample-aware dynamic
sparse finetuning strategy that selectively focuses on a fraction of
parameters, informed by feedback from highly regressing examples, during the
fine-tuning process. By leveraging the dynamism of sparsity, our approach
mitigates the impact of well-learned samples and prioritizes underperforming
instances for improvement in generalization. Across five tasks of sequence
labeling, we demonstrate that FISH-DIP can smoothly optimize the model in low
resource settings offering upto 40% performance improvements over full
fine-tuning depending on target evaluation settings. Also, compared to
in-context learning and other parameter-efficient fine-tuning approaches,
FISH-DIP performs comparably or better, notably in extreme low-resource
settings.
Related papers
- Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.
We introduce novel algorithms for dynamic, instance-level data reweighting.
Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations.
In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%.
DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z) - Few-Shot Optimized Framework for Hallucination Detection in Resource-Limited NLP Systems [1.0124625066746595]
We introduce DeepSeek Few-shot optimization to enhance weak label generation through iterative prompt engineering.
We achieve high-quality annotations that considerably enhanced the performance of downstream models.
We further fine-tuned the Mistral-7B-Instruct-v0.3 model on these optimized annotations, enabling it to accurately detect hallucinations in resource-limited settings.
arXiv Detail & Related papers (2025-01-28T01:26:22Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - Auto-GDA: Automatic Domain Adaptation for Efficient Grounding Verification in Retrieval Augmented Generation [13.120801609024147]
retrieval augmented generation (RAG) has been shown to enhance factuality of large language model (LLM) outputs.
RAG inputs are more complex than most datasets used for training NLI models.
We introduce Automatic Generative Domain Adaptation (Auto-GDA) to enable unsupervised domain adaptation.
arXiv Detail & Related papers (2024-10-04T14:21:27Z) - Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - Prototypical Fine-tuning: Towards Robust Performance Under Varying Data
Sizes [47.880781811936345]
We propose a novel framework for fine-tuning pretrained language models (LM)
Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes.
arXiv Detail & Related papers (2022-11-24T14:38:08Z) - Partial sequence labeling with structured Gaussian Processes [8.239028141030621]
We propose structured Gaussian Processes for partial sequence labeling.
It encodes uncertainty in the prediction and does not need extra effort for model selection and hyper parameter learning.
It is evaluated on several sequence labeling tasks and the experimental results show the effectiveness of the proposed model.
arXiv Detail & Related papers (2022-09-20T00:56:49Z) - Fine-grained Retrieval Prompt Tuning [149.9071858259279]
Fine-grained Retrieval Prompt Tuning steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompt and feature adaptation.
Our FRPT with fewer learnable parameters achieves the state-of-the-art performance on three widely-used fine-grained datasets.
arXiv Detail & Related papers (2022-07-29T04:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.