Related papers: Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition

Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition

URL: http://arxiv.org/abs/2502.12001v1
Date: Mon, 17 Feb 2025 16:39:28 GMT
Title: Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition
Authors: Thibault Rousset, Taisei Kakibuchi, Yusuke Sasaki, Yoshihide Nomura,
Abstract summary: We explore the knowledge transfer mechanisms involved when combining a general-purpose language-specific model with a domain-specific model. Our experiments analyze the impact of this merging process on the target model's proficiency in handling specialized terminology.
Score: 0.0
License:
Abstract: This paper investigates the integration of technical vocabulary in merged language models. We explore the knowledge transfer mechanisms involved when combining a general-purpose language-specific model with a domain-specific model, focusing on the resulting model's comprehension of technical jargon. Our experiments analyze the impact of this merging process on the target model's proficiency in handling specialized terminology. We present a quantitative evaluation of the performance of the merged model, comparing it with that of the individual constituent models. The findings offer insights into the effectiveness of different model merging methods for enhancing domain-specific knowledge and highlight potential challenges and future directions in leveraging these methods for cross-lingual knowledge transfer in Natural Language Processing.

Related papers

Linguistically Grounded Analysis of Language Models using Shapley Head Values [2.914115079173979]
We investigate the processing of morphosyntactic phenomena by leveraging a recently proposed method for probing language models via Shapley Head Values (SHVs) Using the English language BLiMP dataset, we test our approach on two widely used models, BERT and RoBERTa, and compare how linguistic constructions are handled. Our results show that SHV-based attributions reveal distinct patterns across both models, providing insights into how language models organize and process linguistic information.
arXiv Detail & Related papers (2024-10-17T09:48:08Z)
Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
Fine-Tune Language Models as Multi-Modal Differential Equation Solvers [14.181842691371935]
We present a transformation of in-context operator learning into a multi-modal paradigm. In particular, we take inspiration from the recent success of large language models, and propose using "captions" to integrate human knowledge about the operator.
arXiv Detail & Related papers (2023-08-09T16:44:25Z)
Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability. We work out a grey box methodology, in which we train models to perfection on a formal language classification task. We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z)
Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model. It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model. It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
Learning Semantic Textual Similarity via Topic-informed Discrete Latent Variables [17.57873577962635]
We develop a topic-informed discrete latent variable model for semantic textual similarity. Our model learns a shared latent space for sentence-pair representation via vector quantization. We show that our model is able to surpass several strong neural baselines in semantic textual similarity tasks.
arXiv Detail & Related papers (2022-11-07T15:09:58Z)
Testing Pre-trained Language Models' Understanding of Distributivity via Causal Mediation Analysis [13.07356367140208]
We introduce DistNLI, a new diagnostic dataset for natural language inference. We find that the extent of models' understanding is associated with model size and vocabulary size.
arXiv Detail & Related papers (2022-09-11T00:33:28Z)
Introducing Syntactic Structures into Target Opinion Word Extraction with Deep Learning [89.64620296557177]
We propose to incorporate the syntactic structures of the sentences into the deep learning models for targeted opinion word extraction. We also introduce a novel regularization technique to improve the performance of the deep learning models. The proposed model is extensively analyzed and achieves the state-of-the-art performance on four benchmark datasets.
arXiv Detail & Related papers (2020-10-26T07:13:17Z)
A Comparative Study of Lexical Substitution Approaches based on Neural Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models. We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.