Understanding Domain Learning in Language Models Through Subpopulation
Analysis
- URL: http://arxiv.org/abs/2210.12553v1
- Date: Sat, 22 Oct 2022 21:12:57 GMT
- Title: Understanding Domain Learning in Language Models Through Subpopulation
Analysis
- Authors: Zheng Zhao, Yftah Ziser, Shay B. Cohen
- Abstract summary: We investigate how different domains are encoded in modern neural network architectures.
We analyze the relationship between natural language domains, model size, and the amount of training data used.
- Score: 35.16003054930906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate how different domains are encoded in modern neural network
architectures. We analyze the relationship between natural language domains,
model size, and the amount of training data used. The primary analysis tool we
develop is based on subpopulation analysis with Singular Vector Canonical
Correlation Analysis (SVCCA), which we apply to Transformer-based language
models (LMs). We compare the latent representations of such a language model at
its different layers from a pair of models: a model trained on multiple domains
(an experimental model) and a model trained on a single domain (a control
model). Through our method, we find that increasing the model capacity impacts
how domain information is stored in upper and lower layers differently. In
addition, we show that larger experimental models simultaneously embed
domain-specific information as if they were conjoined control models. These
findings are confirmed qualitatively, demonstrating the validity of our method.
Related papers
- Learning to Generalize Unseen Domains via Multi-Source Meta Learning for Text Classification [71.08024880298613]
We study the multi-source Domain Generalization of text classification.
We propose a framework to use multiple seen domains to train a model that can achieve high accuracy in an unseen domain.
arXiv Detail & Related papers (2024-09-20T07:46:21Z) - Knowledge Fusion By Evolving Weights of Language Models [5.354527640064584]
This paper examines the approach of integrating multiple models into a unified model.
We propose a knowledge fusion method named Evolver, inspired by evolutionary algorithms.
arXiv Detail & Related papers (2024-06-18T02:12:34Z) - Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on.
We propose a new approach called D$3$G to learn domain-specific models.
Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z) - QAGAN: Adversarial Approach To Learning Domain Invariant Language
Features [0.76146285961466]
We explore adversarial training approach towards learning domain-invariant features.
We are able to achieve $15.2%$ improvement in EM score and $5.6%$ boost in F1 score on out-of-domain validation dataset.
arXiv Detail & Related papers (2022-06-24T17:42:18Z) - Encoding Domain Knowledge in Multi-view Latent Variable Models: A
Bayesian Approach with Structured Sparsity [7.811916700683125]
MuVI is a novel approach for domain-informed multi-view latent variable models.
We demonstrate that our model is able to integrate noisy domain expertise in form of feature sets.
arXiv Detail & Related papers (2022-04-13T08:22:31Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Towards Trustworthy Deception Detection: Benchmarking Model Robustness
across Domains, Modalities, and Languages [10.131671217810581]
We evaluate model robustness to out-of-domain data, modality-specific features, and languages other than English.
We find that with additional image content as input, ELMo embeddings yield significantly fewer errors compared to BERT orGLoVe.
arXiv Detail & Related papers (2021-04-23T18:05:52Z) - Pruning-then-Expanding Model for Domain Adaptation of Neural Machine
Translation [9.403585397617865]
Domain adaptation is widely used in practical applications of neural machine translation.
The existing methods for domain adaptation usually suffer from catastrophic forgetting, domain divergence, and model explosion.
We propose a method of "divide and conquer" which is based on the importance of neurons or parameters in the translation model.
arXiv Detail & Related papers (2021-03-25T08:57:09Z) - Reprogramming Language Models for Molecular Representation Learning [65.00999660425731]
We propose Representation Reprogramming via Dictionary Learning (R2DL) for adversarially reprogramming pretrained language models for molecular learning tasks.
The adversarial program learns a linear transformation between a dense source model input space (language data) and a sparse target model input space (e.g., chemical and biological molecule data) using a k-SVD solver.
R2DL achieves the baseline established by state of the art toxicity prediction models trained on domain-specific data and outperforms the baseline in a limited training-data setting.
arXiv Detail & Related papers (2020-12-07T05:50:27Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - Unsupervised Domain Clusters in Pretrained Language Models [61.832234606157286]
We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision.
We propose domain data selection methods based on such models.
We evaluate our data selection methods for neural machine translation across five diverse domains.
arXiv Detail & Related papers (2020-04-05T06:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.