Human-in-the-Loop Refinement of Word Embeddings
- URL: http://arxiv.org/abs/2110.02884v1
- Date: Wed, 6 Oct 2021 16:10:32 GMT
- Title: Human-in-the-Loop Refinement of Word Embeddings
- Authors: James Powell, Kari Sentz, Martin Klein
- Abstract summary: We propose a system that incorporates an adaptation of word embedding post-processing, which we call "interactive refitting"
Our approach allows a human to identify and address potential quality issues with word embeddings interactively.
It also allows for better insight into what effect word embeddings, and refinements to word embeddings, have on machine learning pipelines.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Word embeddings are a fixed, distributional representation of the context of
words in a corpus learned from word co-occurrences. Despite their proven
utility in machine learning tasks, word embedding models may capture uneven
semantic and syntactic representations, and can inadvertently reflect various
kinds of bias present within corpora upon which they were trained. It has been
demonstrated that post-processing of word embeddings to apply information found
in lexical dictionaries can improve the semantic associations, thus improving
their quality. Building on this idea, we propose a system that incorporates an
adaptation of word embedding post-processing, which we call "interactive
refitting", to address some of the most daunting qualitative problems found in
word embeddings. Our approach allows a human to identify and address potential
quality issues with word embeddings interactively. This has the advantage of
negating the question of who decides what constitutes bias or what other
quality issues may affect downstream tasks. It allows each organization or
entity to address concerns they may have at a fine grained level and to do so
in an iterative and interactive fashion. It also allows for better insight into
what effect word embeddings, and refinements to word embeddings, have on
machine learning pipelines.
Related papers
- Comparing Performance of Different Linguistically-Backed Word Embeddings
for Cyberbullying Detection [3.029434408969759]
In most cases, word embeddings are learned only from raw tokens or in some cases, lemmas.
We propose to preserve the morphological, syntactic and other types of linguistic information by combining them with the raw tokens or lemmas.
arXiv Detail & Related papers (2022-06-04T09:11:41Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - A Survey On Neural Word Embeddings [0.4822598110892847]
The study of meaning in natural language processing relies on the distributional hypothesis.
The revolutionary idea of distributed representation for a concept is close to the working of a human mind.
Neural word embeddings transformed the whole field of NLP by introducing substantial improvements in all NLP tasks.
arXiv Detail & Related papers (2021-10-05T03:37:57Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - On the Impact of Knowledge-based Linguistic Annotations in the Quality
of Scientific Embeddings [0.0]
We conduct a study on the use of explicit linguistic annotations to generate embeddings from a scientific corpus.
Our results show how the effect of such annotations in the embeddings varies depending on the evaluation task.
In general, we observe that learning embeddings using linguistic annotations contributes to achieve better evaluation results.
arXiv Detail & Related papers (2021-04-13T13:51:22Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Blind signal decomposition of various word embeddings based on join and
individual variance explained [11.542392473831672]
We propose to use a novel joint signal separation method - JIVE to jointly decompose various trained word embeddings into joint and individual components.
We conducted empirical study on word2vec, FastText and GLoVE trained on different corpus and with different dimensions.
We found that by mapping different word embeddings into the joint component, sentiment performance can be greatly improved for the original word embeddings with lower performance.
arXiv Detail & Related papers (2020-11-30T01:36:29Z) - Interactive Re-Fitting as a Technique for Improving Word Embeddings [0.0]
We make it possible for humans to adjust portions of a word embedding space by moving sets of words closer to one another.
Our approach allows users to trigger selective post-processing as they interact with and assess potential bias in word embeddings.
arXiv Detail & Related papers (2020-09-30T21:54:22Z) - Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system.
One of the popular approach to cover OOVs is to use subword units rather then words.
In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z) - Multiplex Word Embeddings for Selectional Preference Acquisition [70.33531759861111]
We propose a multiplex word embedding model, which can be easily extended according to various relations among words.
Our model can effectively distinguish words with respect to different relations without introducing unnecessary sparseness.
arXiv Detail & Related papers (2020-01-09T04:47:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.