Word Embeddings for Banking Industry
- URL: http://arxiv.org/abs/2306.01807v1
- Date: Fri, 2 Jun 2023 01:00:44 GMT
- Title: Word Embeddings for Banking Industry
- Authors: Avnish Patel
- Abstract summary: Bank-specific word embeddings could be a good stand-alone source or a complement to other widely available embeddings.
This paper explores the idea of creating a bank-specific word embeddings and evaluates them against other sources of word embeddings such as GloVe and BERT.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Applications of Natural Language Processing (NLP) are plentiful, from
sentiment analysis to text classification. Practitioners rely on static word
embeddings (e.g. Word2Vec or GloVe) or static word representation from
contextual models (e.g. BERT or ELMo) to perform many of these NLP tasks. These
widely available word embeddings are built from large amount of text, so they
are likely to have captured most of the vocabulary in different context.
However, how well would they capture domain-specific semantics and word
relatedness? This paper explores this idea by creating a bank-specific word
embeddings and evaluates them against other sources of word embeddings such as
GloVe and BERT. Not surprising that embeddings built from bank-specific corpora
does a better job of capturing the bank-specific semantics and word
relatedness. This finding suggests that bank-specific word embeddings could be
a good stand-alone source or a complement to other widely available embeddings
when performing NLP tasks specific to the banking industry.
Related papers
- Using Paraphrases to Study Properties of Contextual Embeddings [46.84861591608146]
We use paraphrases as a unique source of data to analyze contextualized embeddings.
Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings.
We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases.
arXiv Detail & Related papers (2022-07-12T14:22:05Z) - Pretraining without Wordpieces: Learning Over a Vocabulary of Millions
of Words [50.11559460111882]
We explore the possibility of developing BERT-style pretrained model over a vocabulary of words instead of wordpieces.
Results show that, compared to standard wordpiece-based BERT, WordBERT makes significant improvements on cloze test and machine reading comprehension.
Since the pipeline is language-independent, we train WordBERT for Chinese language and obtain significant gains on five natural language understanding datasets.
arXiv Detail & Related papers (2022-02-24T15:15:48Z) - UCPhrase: Unsupervised Context-aware Quality Phrase Tagging [63.86606855524567]
UCPhrase is a novel unsupervised context-aware quality phrase tagger.
We induce high-quality phrase spans as silver labels from consistently co-occurring word sequences.
We show that our design is superior to state-of-the-art pre-trained, unsupervised, and distantly supervised methods.
arXiv Detail & Related papers (2021-05-28T19:44:24Z) - SemGloVe: Semantic Co-occurrences for GloVe from BERT [55.420035541274444]
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices.
We propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings.
arXiv Detail & Related papers (2020-12-30T15:38:26Z) - Deconstructing word embedding algorithms [17.797952730495453]
We propose a retrospective on some of the most well-known word embedding algorithms.
In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the common conditions that seem to be required for making performant word embeddings.
arXiv Detail & Related papers (2020-11-12T14:23:35Z) - Interactive Re-Fitting as a Technique for Improving Word Embeddings [0.0]
We make it possible for humans to adjust portions of a word embedding space by moving sets of words closer to one another.
Our approach allows users to trigger selective post-processing as they interact with and assess potential bias in word embeddings.
arXiv Detail & Related papers (2020-09-30T21:54:22Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Hybrid Improved Document-level Embedding (HIDE) [5.33024001730262]
We propose HIDE a Hybrid Improved Document level Embedding.
It incorporates domain information, parts of speech information and sentiment information into existing word embeddings such as GloVe and Word2Vec.
We show considerable improvement over the accuracy of existing pretrained word vectors such as GloVe and Word2Vec.
arXiv Detail & Related papers (2020-06-01T19:09:13Z) - Attention Word Embedding [23.997145283950346]
We introduce the Attention Word Embedding (AWE) model, which integrates the attention mechanism into the CBOW model.
We also propose AWE-S, which incorporates subword information.
We demonstrate that AWE and AWE-S outperform the state-of-the-art word embedding models both on a variety of word similarity datasets.
arXiv Detail & Related papers (2020-06-01T14:47:48Z) - A Survey on Contextual Embeddings [48.04732268018772]
Contextual embeddings assign each word a representation based on its context, capturing uses of words across varied contexts and encoding knowledge that transfers across languages.
We review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
arXiv Detail & Related papers (2020-03-16T15:22:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.