Augmenting semantic lexicons using word embeddings and transfer learning
- URL: http://arxiv.org/abs/2109.09010v1
- Date: Sat, 18 Sep 2021 20:59:52 GMT
- Title: Augmenting semantic lexicons using word embeddings and transfer learning
- Authors: Thayer Alshaabi, Colin Van Oort, Mikaela Fudolig, Michael V. Arnold,
Christopher M. Danforth, Peter Sheridan Dodds
- Abstract summary: We propose two models for predicting sentiment scores to augment semantic lexicons at a relatively low cost using word embeddings and transfer learning.
Our evaluation shows both models are able to score new words with a similar accuracy to reviewers from Amazon Mechanical Turk, but at a fraction of the cost.
- Score: 1.101002667958165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentiment-aware intelligent systems are essential to a wide array of
applications including marketing, political campaigns, recommender systems,
behavioral economics, social psychology, and national security. These
sentiment-aware intelligent systems are driven by language models which broadly
fall into two paradigms: 1. Lexicon-based and 2. Contextual. Although recent
contextual models are increasingly dominant, we still see demand for
lexicon-based models because of their interpretability and ease of use. For
example, lexicon-based models allow researchers to readily determine which
words and phrases contribute most to a change in measured sentiment. A
challenge for any lexicon-based approach is that the lexicon needs to be
routinely expanded with new words and expressions. Crowdsourcing annotations
for semantic dictionaries may be an expensive and time-consuming task. Here, we
propose two models for predicting sentiment scores to augment semantic lexicons
at a relatively low cost using word embeddings and transfer learning. Our first
model establishes a baseline employing a simple and shallow neural network
initialized with pre-trained word embeddings using a non-contextual approach.
Our second model improves upon our baseline, featuring a deep Transformer-based
network that brings to bear word definitions to estimate their lexical
polarity. Our evaluation shows that both models are able to score new words
with a similar accuracy to reviewers from Amazon Mechanical Turk, but at a
fraction of the cost.
Related papers
- DesCo: Learning Object Recognition with Rich Language Descriptions [93.8177229428617]
Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision.
We propose a new description-conditioned (DesCo) paradigm of learning object recognition models with rich language descriptions.
arXiv Detail & Related papers (2023-06-24T21:05:02Z) - Tokenization Impacts Multilingual Language Modeling: Assessing
Vocabulary Allocation and Overlap Across Languages [3.716965622352967]
We propose new criteria to evaluate the quality of lexical representation and vocabulary overlap observed in sub-word tokenizers.
Our findings show that the overlap of vocabulary across languages can be actually detrimental to certain downstream tasks.
arXiv Detail & Related papers (2023-05-26T18:06:49Z) - From Characters to Words: Hierarchical Pre-trained Language Model for
Open-vocabulary Language Understanding [22.390804161191635]
Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens.
This process known as tokenization relies on a pre-built vocabulary of words or sub-word morphemes.
We introduce a novel open-vocabulary language model that adopts a hierarchical two-level approach.
arXiv Detail & Related papers (2023-05-23T23:22:20Z) - ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for
Scene Text Spotting [121.11880210592497]
We argue that the limited capacity of language models comes from 1) implicit language modeling; 2) unidirectional feature representation; and 3) language model with noise input.
We propose an autonomous, bidirectional and iterative ABINet++ for scene text spotting.
arXiv Detail & Related papers (2022-11-19T03:50:33Z) - Text analysis and deep learning: A network approach [0.0]
We propose a novel method that combines transformer models with network analysis to form a self-referential representation of language use within a corpus of interest.
Our approach produces linguistic relations strongly consistent with the underlying model as well as mathematically well-defined operations on them.
It represents, to the best of our knowledge, the first unsupervised method to extract semantic networks directly from deep language models.
arXiv Detail & Related papers (2021-10-08T14:18:36Z) - Neural Abstructions: Abstractions that Support Construction for Grounded
Language Learning [69.1137074774244]
Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding.
We introduce the idea of neural abstructions: a set of constraints on the inference procedure of a label-conditioned generative model.
We show that with this method a user population is able to build a semantic modification for an open-ended house task in Minecraft.
arXiv Detail & Related papers (2021-07-20T07:01:15Z) - Embodying Pre-Trained Word Embeddings Through Robot Actions [9.048164930020404]
Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots.
Previous studies have shown that robots can use words that are not included in the action-description paired datasets by using pre-trained word embeddings.
We transform the pre-trained word embeddings to embodied ones by using the robot's sensory-motor experiences.
arXiv Detail & Related papers (2021-04-17T12:04:49Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Word Sense Disambiguation for 158 Languages using Word Embeddings Only [80.79437083582643]
Disambiguation of word senses in context is easy for humans, but a major challenge for automatic approaches.
We present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory.
We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings.
arXiv Detail & Related papers (2020-03-14T14:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.