Interactive Re-Fitting as a Technique for Improving Word Embeddings
- URL: http://arxiv.org/abs/2010.00121v1
- Date: Wed, 30 Sep 2020 21:54:22 GMT
- Title: Interactive Re-Fitting as a Technique for Improving Word Embeddings
- Authors: James Powell, Kari Sentz
- Abstract summary: We make it possible for humans to adjust portions of a word embedding space by moving sets of words closer to one another.
Our approach allows users to trigger selective post-processing as they interact with and assess potential bias in word embeddings.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Word embeddings are a fixed, distributional representation of the context of
words in a corpus learned from word co-occurrences. While word embeddings have
proven to have many practical uses in natural language processing tasks, they
reflect the attributes of the corpus upon which they are trained. Recent work
has demonstrated that post-processing of word embeddings to apply information
found in lexical dictionaries can improve their quality. We build on this
post-processing technique by making it interactive. Our approach makes it
possible for humans to adjust portions of a word embedding space by moving sets
of words closer to one another. One motivating use case for this capability is
to enable users to identify and reduce the presence of bias in word embeddings.
Our approach allows users to trigger selective post-processing as they interact
with and assess potential bias in word embeddings.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - SensePOLAR: Word sense aware interpretability for pre-trained contextual
word embeddings [4.479834103607384]
Adding interpretability to word embeddings represents an area of active research in text representation.
We present SensePOLAR, an extension of the original POLAR framework that enables word-sense aware interpretability for pre-trained contextual word embeddings.
arXiv Detail & Related papers (2023-01-11T20:25:53Z) - Human-in-the-Loop Refinement of Word Embeddings [0.0]
We propose a system that incorporates an adaptation of word embedding post-processing, which we call "interactive refitting"
Our approach allows a human to identify and address potential quality issues with word embeddings interactively.
It also allows for better insight into what effect word embeddings, and refinements to word embeddings, have on machine learning pipelines.
arXiv Detail & Related papers (2021-10-06T16:10:32Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - Fake it Till You Make it: Self-Supervised Semantic Shifts for
Monolingual Word Embedding Tasks [58.87961226278285]
We propose a self-supervised approach to model lexical semantic change.
We show that our method can be used for the detection of semantic change with any alignment method.
We illustrate the utility of our techniques using experimental results on three different datasets.
arXiv Detail & Related papers (2021-01-30T18:59:43Z) - Learning Efficient Task-Specific Meta-Embeddings with Word Prisms [17.288765083303243]
We introduce word prisms: a simple and efficient meta-embedding method that learns to combine source embeddings according to the task at hand.
We evaluate word prisms in comparison to other meta-embedding methods on six extrinsic evaluations and observe that word prisms offer improvements on all tasks.
arXiv Detail & Related papers (2020-11-05T16:08:50Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z) - Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems [54.49880724137688]
The problem of out of vocabulary words (OOV) is typical for any speech recognition system.
One of the popular approach to cover OOVs is to use subword units rather then words.
In this paper we explore different existing methods of this solution on both graph construction and search method levels.
arXiv Detail & Related papers (2020-03-19T21:24:45Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.