Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun
Property Prediction
- URL: http://arxiv.org/abs/2210.12905v1
- Date: Mon, 24 Oct 2022 01:25:21 GMT
- Title: Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun
Property Prediction
- Authors: Yue Yang, Artemis Panagopoulou, Marianna Apidianaki, Mark Yatskar and
Chris Callison-Burch
- Abstract summary: properties of nouns are more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts.
We propose to extract these properties from images and use them in an ensemble model, in order to complement the information that is extracted from language models.
Our results show that the proposed combination of text and images greatly improves noun property prediction compared to powerful text-based language models.
- Score: 34.37730333491428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural language models encode rich knowledge about entities and their
relationships which can be extracted from their representations using probing.
Common properties of nouns (e.g., red strawberries, small ant) are, however,
more challenging to extract compared to other types of knowledge because they
are rarely explicitly stated in texts. We hypothesize this to mainly be the
case for perceptual properties which are obvious to the participants in the
communication. We propose to extract these properties from images and use them
in an ensemble model, in order to complement the information that is extracted
from language models. We consider perceptual properties to be more concrete
than abstract properties (e.g., interesting, flawless). We propose to use the
adjectives' concreteness score as a lever to calibrate the contribution of each
source (text vs. images). We evaluate our ensemble model in a ranking task
where the actual properties of a noun need to be ranked higher than other
non-relevant properties. Our results show that the proposed combination of text
and images greatly improves noun property prediction compared to powerful
text-based language models.
Related papers
- Are we describing the same sound? An analysis of word embedding spaces
of expressive piano performance [4.867952721052875]
We investigate the uncertainty for the domain of characterizations of expressive piano performance.
We test five embedding models and their similarity structure for correspondence with the ground truth.
The quality of embedding models shows great variability with respect to this task.
arXiv Detail & Related papers (2023-12-31T12:20:03Z) - Modelling Commonsense Properties using Pre-Trained Bi-Encoders [40.327695801431375]
We study the possibility of fine-tuning language models to explicitly model concepts and their properties.
Our experimental results show that the resulting encoders allow us to predict commonsense properties with much higher accuracy than is possible.
arXiv Detail & Related papers (2022-10-06T09:17:34Z) - Towards Explainability in NLP: Analyzing and Calculating Word Saliency
through Word Properties [4.330880304715002]
We explore the relationships between the word saliency and the word properties.
We establish a mapping model, Seq2Saliency, from the words in a text sample and their properties to the saliency values.
The experimental evaluations are conducted to analyze the saliency of words with different properties.
arXiv Detail & Related papers (2022-07-17T06:02:48Z) - Visual Clues: Bridging Vision and Language Foundations for Image
Paragraph Captioning [78.07495777674747]
We argue that by using visual clues to bridge large pretrained vision foundation models and language models, we can do so without any extra cross-modal training.
Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image.
We use large language model to produce a series of comprehensive descriptions for the visual content, which is then verified by the vision model again to select the candidate that aligns best with the image.
arXiv Detail & Related papers (2022-06-03T22:33:09Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for
Nouns' Semantic Properties and their Prototypicality [4.915907527975786]
We probe BERT (Devlin et al.) for the construction of English nouns as expressed by adjectives that do not restrict the reference scope.
We base our study on psycholinguistics datasets that capture the association strength between nouns and their semantic features.
We show that when tested in a fine-tuning setting addressing entailment, BERT successfully leverages the information needed for reasoning about the meaning of adjective-nouns.
arXiv Detail & Related papers (2021-10-12T21:43:37Z) - Dependency Induction Through the Lens of Visual Perception [81.91502968815746]
We propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based to jointly learn constituency-structure and dependency-structure grammars.
Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.
arXiv Detail & Related papers (2021-09-20T18:40:37Z) - Learning from Context or Names? An Empirical Study on Neural Relation
Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names)
We propose an entity-masked contrastive pre-training framework for relation extraction (RE)
Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.