Improving Transfer Learning for Sequence Labeling Tasks by Adapting Pre-trained Neural Language Models
- URL: http://arxiv.org/abs/2510.20033v1
- Date: Wed, 22 Oct 2025 21:23:53 GMT
- Title: Improving Transfer Learning for Sequence Labeling Tasks by Adapting Pre-trained Neural Language Models
- Authors: David Dukić,
- Abstract summary: This thesis improves the transfer learning for sequence labeling tasks by adapting pre-trained neural language models.<n>The proposed improvements in transfer learning involve introducing a multi-task model that incorporates an additional signal.<n>The third improvement utilizes autoregressive large language models as text generators through a generative supervised in-context fine-tuning framework.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This doctoral thesis improves the transfer learning for sequence labeling tasks by adapting pre-trained neural language models. The proposed improvements in transfer learning involve introducing a multi-task model that incorporates an additional signal, a method based on architectural modifications in autoregressive large language models, and a sequence labeling framework for autoregressive large language models utilizing supervised in-context fine-tuning combined with response-oriented adaptation strategies. The first improvement is given in the context of domain transfer for the event trigger detection task. The domain transfer of the event trigger detection task can be improved by incorporating an additional signal obtained from a domain-independent text processing system into a multi-task model. The second improvement involves modifying the model's architecture. For that purpose, a method is proposed to enable bidirectional information flow across layers of autoregressive large language models. The third improvement utilizes autoregressive large language models as text generators through a generative supervised in-context fine-tuning framework. The proposed model, method, and framework demonstrate that pre-trained neural language models achieve their best performance on sequence labeling tasks when adapted through targeted transfer learning paradigms.
Related papers
- Model-Aware Tokenizer Transfer [46.13517417540154]
Model-Aware Tokenizer Transfer (MATT) is a method that incorporates model internals into the tokenizer transfer process.<n>MATT introduces an Attention Influence Modeling (AIM) objective that distills inter-token communication patterns from a source model into a target model.<n>Experiments show that MATT recovers a large fraction of the original model's performance within a few GPU hours, outperforming baselines.
arXiv Detail & Related papers (2025-10-24T18:27:36Z) - Contextually Guided Transformers via Low-Rank Adaptation [14.702057924366345]
Large Language Models (LLMs) based on Transformers excel at text processing, but their reliance on prompts for specialized behavior introduces computational overhead.<n>We propose a modification to a Transformer architecture that eliminates the need for explicit prompts by learning to encode context into the model's weights.
arXiv Detail & Related papers (2025-06-06T01:34:39Z) - Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness [3.2925222641796554]
"pointer-guided segment ordering" (SO) is a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations.
Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures.
arXiv Detail & Related papers (2024-06-06T15:17:51Z) - BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained
Transformer [77.28871523946418]
BatGPT is a large-scale language model designed and trained jointly by Wuhan University and Shanghai Jiao Tong University.
It is capable of generating highly natural and fluent text in response to various types of input, including text prompts, images, and audio.
arXiv Detail & Related papers (2023-07-01T15:10:01Z) - Pre-Training a Graph Recurrent Network for Language Representation [34.4554387894105]
We consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications.
We find that our model can generate more diverse outputs with less contextualized feature redundancy than existing attention-based models.
arXiv Detail & Related papers (2022-09-08T14:12:15Z) - Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken
Conversations [22.894541507068933]
This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10.
We employ extensive data augmentation strategies on written data, including artificial error injection and round-trip text-speech transformation.
Our approach ranks third on the objective evaluation and second on the final official human evaluation.
arXiv Detail & Related papers (2022-03-08T12:26:57Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Lightweight Cross-Lingual Sentence Representation Learning [57.9365829513914]
We introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations.
We propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task.
arXiv Detail & Related papers (2021-05-28T14:10:48Z) - SLM: Learning a Discourse Language Representation with Sentence
Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation.
We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.