Group-Sparse Matrix Factorization for Transfer Learning of Word
Embeddings
- URL: http://arxiv.org/abs/2104.08928v3
- Date: Sat, 17 Feb 2024 08:02:59 GMT
- Title: Group-Sparse Matrix Factorization for Transfer Learning of Word
Embeddings
- Authors: Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani
- Abstract summary: We propose an intuitive estimator that exploits structure via a groupsparse penalty to efficiently transfer learn domainspecific word embeddings.
We prove that all local minima identified by our noncorpora objective function are statistically indistinguishable from the minimum under standard regularization conditions.
- Score: 31.849734024331283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unstructured text provides decision-makers with a rich data source in many
domains, ranging from product reviews in retail to nursing notes in healthcare.
To leverage this information, words are typically translated into word
embeddings -- vectors that encode the semantic relationships between words --
through unsupervised learning algorithms such as matrix factorization. However,
learning word embeddings from new domains with limited training data can be
challenging, because the meaning/usage may be different in the new domain,
e.g., the word ``positive'' typically has positive sentiment, but often has
negative sentiment in medical notes since it may imply that a patient tested
positive for a disease. In practice, we expect that only a small number of
domain-specific words may have new meanings. We propose an intuitive two-stage
estimator that exploits this structure via a group-sparse penalty to
efficiently transfer learn domain-specific word embeddings by combining
large-scale text corpora (such as Wikipedia) with limited domain-specific text
data. We bound the generalization error of our transfer learning estimator,
proving that it can achieve high accuracy with substantially less
domain-specific data when only a small number of embeddings are altered between
domains. Furthermore, we prove that all local minima identified by our
nonconvex objective function are statistically indistinguishable from the
global minimum under standard regularization conditions, implying that our
estimator can be computed efficiently. Our results provide the first bounds on
group-sparse matrix factorization, which may be of independent interest. We
empirically evaluate our approach compared to state-of-the-art fine-tuning
heuristics from natural language processing.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.