Group-Sparse Matrix Factorization for Transfer Learning of Word
Embeddings
- URL: http://arxiv.org/abs/2104.08928v3
- Date: Sat, 17 Feb 2024 08:02:59 GMT
- Title: Group-Sparse Matrix Factorization for Transfer Learning of Word
Embeddings
- Authors: Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani
- Abstract summary: We propose an intuitive estimator that exploits structure via a groupsparse penalty to efficiently transfer learn domainspecific word embeddings.
We prove that all local minima identified by our noncorpora objective function are statistically indistinguishable from the minimum under standard regularization conditions.
- Score: 31.849734024331283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unstructured text provides decision-makers with a rich data source in many
domains, ranging from product reviews in retail to nursing notes in healthcare.
To leverage this information, words are typically translated into word
embeddings -- vectors that encode the semantic relationships between words --
through unsupervised learning algorithms such as matrix factorization. However,
learning word embeddings from new domains with limited training data can be
challenging, because the meaning/usage may be different in the new domain,
e.g., the word ``positive'' typically has positive sentiment, but often has
negative sentiment in medical notes since it may imply that a patient tested
positive for a disease. In practice, we expect that only a small number of
domain-specific words may have new meanings. We propose an intuitive two-stage
estimator that exploits this structure via a group-sparse penalty to
efficiently transfer learn domain-specific word embeddings by combining
large-scale text corpora (such as Wikipedia) with limited domain-specific text
data. We bound the generalization error of our transfer learning estimator,
proving that it can achieve high accuracy with substantially less
domain-specific data when only a small number of embeddings are altered between
domains. Furthermore, we prove that all local minima identified by our
nonconvex objective function are statistically indistinguishable from the
global minimum under standard regularization conditions, implying that our
estimator can be computed efficiently. Our results provide the first bounds on
group-sparse matrix factorization, which may be of independent interest. We
empirically evaluate our approach compared to state-of-the-art fine-tuning
heuristics from natural language processing.
Related papers
- Unsupervised Domain Adaptation for Sparse Retrieval by Filling
Vocabulary and Word Frequency Gaps [12.573927420408365]
IR models using a pretrained language model significantly outperform lexical approaches like BM25.
This paper proposes an unsupervised domain adaptation method by filling vocabulary and word-frequency gaps.
We show that our method outperforms the present stateof-the-art domain adaptation method.
arXiv Detail & Related papers (2022-11-08T03:58:26Z) - Domain Adaptive Semantic Segmentation without Source Data [50.18389578589789]
We investigate domain adaptive semantic segmentation without source data, which assumes that the model is pre-trained on the source domain.
We propose an effective framework for this challenging problem with two components: positive learning and negative learning.
Our framework can be easily implemented and incorporated with other methods to further enhance the performance.
arXiv Detail & Related papers (2021-10-13T04:12:27Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - A comprehensive empirical analysis on cross-domain semantic enrichment
for detection of depressive language [0.9749560288448115]
We start with a rich word embedding pre-trained from a large general dataset, which is then augmented with embeddings learned from a much smaller and more specific domain dataset through a simple non-linear mapping mechanism.
We show that our augmented word embedding representations achieve a significantly better F1 score than the others, specially when applied to a high quality dataset.
arXiv Detail & Related papers (2021-06-24T07:15:09Z) - Quantifying and Improving Transferability in Domain Generalization [53.16289325326505]
Out-of-distribution generalization is one of the key challenges when transferring a model from the lab to the real world.
We formally define transferability that one can quantify and compute in domain generalization.
We propose a new algorithm for learning transferable features and test it over various benchmark datasets.
arXiv Detail & Related papers (2021-06-07T14:04:32Z) - Contrastive Learning and Self-Training for Unsupervised Domain
Adaptation in Semantic Segmentation [71.77083272602525]
UDA attempts to provide efficient knowledge transfer from a labeled source domain to an unlabeled target domain.
We propose a contrastive learning approach that adapts category-wise centroids across domains.
We extend our method with self-training, where we use a memory-efficient temporal ensemble to generate consistent and reliable pseudo-labels.
arXiv Detail & Related papers (2021-05-05T11:55:53Z) - Accelerating Text Mining Using Domain-Specific Stop Word Lists [57.76576681191192]
We present a novel approach for the automatic extraction of domain-specific words called the hyperplane-based approach.
The hyperplane-based approach can significantly reduce text dimensionality by eliminating irrelevant features.
Results indicate that the hyperplane-based approach can reduce the dimensionality of the corpus by 90% and outperforms mutual information.
arXiv Detail & Related papers (2020-11-18T17:42:32Z) - Domain Adaptation for Semantic Parsing [68.81787666086554]
We propose a novel semantic for domain adaptation, where we have much fewer annotated data in the target domain compared to the source domain.
Our semantic benefits from a two-stage coarse-to-fine framework, thus can provide different and accurate treatments for the two stages.
Experiments on a benchmark dataset show that our method consistently outperforms several popular domain adaptation strategies.
arXiv Detail & Related papers (2020-06-23T14:47:41Z) - Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain [1.3526604206343171]
Interpretability is a key means to justification which is an integral part when it comes to biomedical applications.
We present an inclusive study on interpretability of word embeddings in the medical domain, focusing on the role of sparse methods.
Based on our experiments, it is seen that sparse word vectors show far more interpretability while preserving the performance of their original vectors in downstream tasks.
arXiv Detail & Related papers (2020-05-11T13:56:58Z) - Pseudo Labeling and Negative Feedback Learning for Large-scale
Multi-label Domain Classification [18.18754040189615]
In large-scale domain classification, an utterance can be handled by multiple domains with overlapped capabilities.
In this paper, given one ground-truth domain for each training utterance, we regard domains consistently predicted with the highest confidences as additional pseudo labels for the training.
In order to reduce prediction errors due to incorrect pseudo labels, we leverage utterances with negative system responses to decrease the confidences of the incorrectly predicted domains.
arXiv Detail & Related papers (2020-03-08T06:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.