Can a Fruit Fly Learn Word Embeddings?
- URL: http://arxiv.org/abs/2101.06887v2
- Date: Sun, 14 Mar 2021 19:50:25 GMT
- Title: Can a Fruit Fly Learn Word Embeddings?
- Authors: Yuchen Liang, Chaitanya K. Ryali, Benjamin Hoover, Leopold Grinberg,
Saket Navlakha, Mohammed J. Zaki, Dmitry Krotov
- Abstract summary: The fruit fly brain is one of the best studied systems in neuroscience.
We show that a network motif can learn semantic representations of words and can generate both static and context-dependent word embeddings.
It is shown that not only can the fruit fly network motif achieve performance comparable to existing methods in NLP, but, additionally, it uses only a fraction of the computational resources.
- Score: 16.280120177501733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The mushroom body of the fruit fly brain is one of the best studied systems
in neuroscience. At its core it consists of a population of Kenyon cells, which
receive inputs from multiple sensory modalities. These cells are inhibited by
the anterior paired lateral neuron, thus creating a sparse high dimensional
representation of the inputs. In this work we study a mathematical
formalization of this network motif and apply it to learning the correlational
structure between words and their context in a corpus of unstructured text, a
common natural language processing (NLP) task. We show that this network can
learn semantic representations of words and can generate both static and
context-dependent word embeddings. Unlike conventional methods (e.g., BERT,
GloVe) that use dense representations for word embedding, our algorithm encodes
semantic meaning of words and their context in the form of sparse binary hash
codes. The quality of the learned representations is evaluated on word
similarity analysis, word-sense disambiguation, and document classification. It
is shown that not only can the fruit fly network motif achieve performance
comparable to existing methods in NLP, but, additionally, it uses only a
fraction of the computational resources (shorter training time and smaller
memory footprint).
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Tsetlin Machine Embedding: Representing Words Using Logical Expressions [10.825099126920028]
We introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised.
The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee"
We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks.
arXiv Detail & Related papers (2023-01-02T15:02:45Z) - Incorporating Constituent Syntax for Coreference Resolution [50.71868417008133]
We propose a graph-based method to incorporate constituent syntactic structures.
We also explore to utilise higher-order neighbourhood information to encode rich structures in constituent trees.
Experiments on the English and Chinese portions of OntoNotes 5.0 benchmark show that our proposed model either beats a strong baseline or achieves new state-of-the-art performance.
arXiv Detail & Related papers (2022-02-22T07:40:42Z) - Grammar-Based Grounded Lexicon Learning [68.59500589319023]
G2L2 is a lexicalist approach toward learning a compositional and grounded meaning representation of language.
At the core of G2L2 is a collection of lexicon entries, which map each word to a syntactic type and a neuro-symbolic semantic program.
G2L2 can generalize from small amounts of data to novel compositions of words.
arXiv Detail & Related papers (2022-02-17T18:19:53Z) - A Survey On Neural Word Embeddings [0.4822598110892847]
The study of meaning in natural language processing relies on the distributional hypothesis.
The revolutionary idea of distributed representation for a concept is close to the working of a human mind.
Neural word embeddings transformed the whole field of NLP by introducing substantial improvements in all NLP tasks.
arXiv Detail & Related papers (2021-10-05T03:37:57Z) - Low-Dimensional Structure in the Space of Language Representations is
Reflected in Brain Responses [62.197912623223964]
We show a low-dimensional structure where language models and translation models smoothly interpolate between word embeddings, syntactic and semantic tasks, and future word embeddings.
We find that this representation embedding can predict how well each individual feature space maps to human brain responses to natural language stimuli recorded using fMRI.
This suggests that the embedding captures some part of the brain's natural language representation structure.
arXiv Detail & Related papers (2021-06-09T22:59:12Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - Using Holographically Compressed Embeddings in Question Answering [0.0]
This research employs holographic compression of pre-trained embeddings to represent a token, its part-of-speech, and named entity type.
The implementation, in a modified question answering recurrent deep learning network, shows that semantic relationships are preserved, and yields strong performance.
arXiv Detail & Related papers (2020-07-14T18:29:49Z) - On the Learnability of Concepts: With Applications to Comparing Word
Embedding Algorithms [0.0]
We introduce the notion of "concept" as a list of words that have shared semantic content.
We first use this notion to measure the learnability of concepts on pretrained word embeddings.
We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms.
arXiv Detail & Related papers (2020-06-17T14:25:36Z) - Comparative Analysis of Word Embeddings for Capturing Word Similarities [0.0]
Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks.
Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings.
selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans.
arXiv Detail & Related papers (2020-05-08T01:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.