Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition
- URL: http://arxiv.org/abs/2502.12001v2
- Date: Wed, 12 Mar 2025 02:00:49 GMT
- Title: Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition
- Authors: Thibault Rousset, Taisei Kakibuchi, Yusuke Sasaki, Yoshihide Nomura,
- Abstract summary: This paper investigates the integration of technical vocabulary in merged language models.<n>Our experiments analyze the impact of this merging process on the target model's proficiency in handling specialized terminology.<n>The findings offer insights into the effectiveness of different model merging methods for enhancing domain-specific knowledge.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advancements in Natural Language Processing have enabled specialized language models, but integrating domain-specific knowledge into general-purpose models in multilingual settings remains challenging, particularly for technical vocabulary. This paper investigates the integration of technical vocabulary in merged language models and explores the knowledge transfer mechanisms involved when combining a general-purpose language-specific model with a domain-specific model, focusing on the resulting model's comprehension of technical jargon. Our experiments analyze the impact of this merging process on the target model's proficiency in handling specialized terminology. We present a quantitative evaluation of the performance of the merged model, comparing it with that of the individual constituent models. The findings offer insights into the effectiveness of different model merging methods for enhancing domain-specific knowledge and highlight potential challenges and future directions in leveraging these methods for cross-lingual knowledge transfer in Natural Language Processing.
Related papers
- On The Landscape of Spoken Language Models: A Comprehensive Survey [144.11278973534203]
spoken language models (SLMs) act as universal speech processing systems.
Work in this area is very diverse, with a range of terminology and evaluation settings.
arXiv Detail & Related papers (2025-04-11T13:40:53Z) - Linguistic Interpretability of Transformer-based Language Models: a systematic review [1.3194391758295114]
Language models based on the Transformer architecture achieve excellent results in many language-related tasks.
However, little is known about how their internal computations help them achieve their results.
There is, however, a line of research -- 'interpretability' -- aiming to learn how information is encoded inside these models.
arXiv Detail & Related papers (2025-04-09T08:00:12Z) - Linguistically Grounded Analysis of Language Models using Shapley Head Values [2.914115079173979]
We investigate the processing of morphosyntactic phenomena by leveraging a recently proposed method for probing language models via Shapley Head Values (SHVs)
Using the English language BLiMP dataset, we test our approach on two widely used models, BERT and RoBERTa, and compare how linguistic constructions are handled.
Our results show that SHV-based attributions reveal distinct patterns across both models, providing insights into how language models organize and process linguistic information.
arXiv Detail & Related papers (2024-10-17T09:48:08Z) - DSG-KD: Knowledge Distillation from Domain-Specific to General Language Models [8.328673243329794]
This study investigates emergency/non-emergency classification tasks based on electronic medical record (EMR) data obtained from pediatric emergency departments (PEDs) in Korea.
Existing domain-specific pre-trained language models underperform compared to general language models in handling N-lingual free-text data characteristics.
We propose a domain knowledge transfer methodology that leverages knowledge distillation to infuse general language models with domain-specific knowledge via fine-tuning.
arXiv Detail & Related papers (2024-09-23T10:59:02Z) - MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting [53.77590764277568]
We introduce a novel MoE-CT architecture that separates the base model's learning from the multilingual expansion process.
Our design freezes the original LLM parameters, thus safeguarding its performance in high-resource languages, while an appended MoE module, trained on diverse language datasets, augments low-resource language proficiency.
arXiv Detail & Related papers (2024-06-25T11:03:45Z) - Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies.
We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - A quantitative analysis of knowledge-learning preferences in large language models in molecular science [24.80165173525286]
Large language models (LLMs) introduce a fresh research paradigm to tackle scientific problems from a natural language processing (NLP) perspective.<n>LLMs significantly enhance our understanding and generation of molecules, often surpassing existing methods with their capabilities to decode and synthesize complex molecular patterns.<n>We propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263 experiments to assess the model's compatibility with data modalities and knowledge acquisition.
arXiv Detail & Related papers (2024-02-06T16:12:36Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Feature Interactions Reveal Linguistic Structure in Language Models [2.0178765779788495]
We study feature interactions in the context of feature attribution methods for post-hoc interpretability.
We work out a grey box methodology, in which we train models to perfection on a formal language classification task.
We show that under specific configurations, some methods are indeed able to uncover the grammatical rules acquired by a model.
arXiv Detail & Related papers (2023-06-21T11:24:41Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost.
Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z) - Learning Semantic Textual Similarity via Topic-informed Discrete Latent
Variables [17.57873577962635]
We develop a topic-informed discrete latent variable model for semantic textual similarity.
Our model learns a shared latent space for sentence-pair representation via vector quantization.
We show that our model is able to surpass several strong neural baselines in semantic textual similarity tasks.
arXiv Detail & Related papers (2022-11-07T15:09:58Z) - Testing Pre-trained Language Models' Understanding of Distributivity via
Causal Mediation Analysis [13.07356367140208]
We introduce DistNLI, a new diagnostic dataset for natural language inference.
We find that the extent of models' understanding is associated with model size and vocabulary size.
arXiv Detail & Related papers (2022-09-11T00:33:28Z) - Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in
Natural Language Understanding [1.827510863075184]
Curriculum is a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena.
We show that this linguistic-phenomena-driven benchmark can serve as an effective tool for diagnosing model behavior and verifying model learning quality.
arXiv Detail & Related papers (2022-04-13T10:32:03Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.