Can Language Models Encode Perceptual Structure Without Grounding? A
Case Study in Color
- URL: http://arxiv.org/abs/2109.06129v2
- Date: Tue, 14 Sep 2021 07:10:41 GMT
- Title: Can Language Models Encode Perceptual Structure Without Grounding? A
Case Study in Color
- Authors: Mostafa Abdou, Artur Kulmizev, Daniel Hershcovich, Stella Frank, Ellie
Pavlick, Anders S{\o}gaard
- Abstract summary: We employ a dataset of monolexemic color terms and color chips represented in CIELAB, a color space with a perceptually meaningful distance metric.
Using two methods of evaluating the structural alignment of colors in this space with text-derived color term representations, we find significant correspondence.
We find that warmer colors are, on average, better aligned to the perceptual color space than cooler ones.
- Score: 18.573415435334105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained language models have been shown to encode relational information,
such as the relations between entities or concepts in knowledge-bases --
(Paris, Capital, France). However, simple relations of this type can often be
recovered heuristically and the extent to which models implicitly reflect
topological structure that is grounded in world, such as perceptual structure,
is unknown. To explore this question, we conduct a thorough case study on
color. Namely, we employ a dataset of monolexemic color terms and color chips
represented in CIELAB, a color space with a perceptually meaningful distance
metric.
Using two methods of evaluating the structural alignment of colors in this
space with text-derived color term representations, we find significant
correspondence. Analyzing the differences in alignment across the color
spectrum, we find that warmer colors are, on average, better aligned to the
perceptual color space than cooler ones, suggesting an intriguing connection to
findings from recent work on efficient communication in color naming. Further
analysis suggests that differences in alignment are, in part, mediated by
collocationality and differences in syntactic usage, posing questions as to the
relationship between color perception and usage and context.
Related papers
- Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models [8.65146533481257]
We introduce a method to produce results with the same structure of a target image but painted with colors from a reference image.
Existing methods rely on the query-key similarity within self-attention layer, usually producing defective results.
arXiv Detail & Related papers (2024-06-11T07:08:48Z) - Are we describing the same sound? An analysis of word embedding spaces
of expressive piano performance [4.867952721052875]
We investigate the uncertainty for the domain of characterizations of expressive piano performance.
We test five embedding models and their similarity structure for correspondence with the ground truth.
The quality of embedding models shows great variability with respect to this task.
arXiv Detail & Related papers (2023-12-31T12:20:03Z) - Perceptual Structure in the Absence of Grounding for LLMs: The Impact of
Abstractedness and Subjectivity in Color Language [2.6094835036012864]
We show that there is considerable alignment between a defined color space and the feature space defined by a language model.
Our results show that while color space alignment holds for monolexemic, highly pragmatic color descriptions, this alignment drops considerably in the presence of examples that exhibit elements of real linguistic usage.
arXiv Detail & Related papers (2023-11-22T02:12:36Z) - Compositional Temporal Grounding with Structured Variational Cross-Graph
Correspondence Learning [92.07643510310766]
Temporal grounding in videos aims to localize one target video segment that semantically corresponds to a given query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We empirically find that they fail to generalize to queries with novel combinations of seen words.
We propose a variational cross-graph reasoning framework that explicitly decomposes video and language into multiple structured hierarchies.
arXiv Detail & Related papers (2022-03-24T12:55:23Z) - The World of an Octopus: How Reporting Bias Influences a Language
Model's Perception of Color [73.70233477125781]
We show that reporting bias negatively impacts and inherently limits text-only training.
We then demonstrate that multimodal models can leverage their visual training to mitigate these effects.
arXiv Detail & Related papers (2021-10-15T16:28:17Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - Assessing The Importance Of Colours For CNNs In Object Recognition [70.70151719764021]
Convolutional neural networks (CNNs) have been shown to exhibit conflicting properties.
We demonstrate that CNNs often rely heavily on colour information while making a prediction.
We evaluate a model trained with congruent images on congruent, greyscale, and incongruent images.
arXiv Detail & Related papers (2020-12-12T22:55:06Z) - The Geometry of Distributed Representations for Better Alignment,
Attenuated Bias, and Improved Interpretability [9.215513608145994]
High-dimensional representations for words, text, images, knowledge graphs and other structured data are commonly used in machine learning and data mining.
These representations have different degrees of interpretability, with efficient distributed representations coming at the cost of the loss of feature to dimension mapping.
Its effects are seen in many representations and tasks, one particularly problematic one being in language representations where the societal biases, learned from underlying data, are captured and occluded in unknown dimensions and subspaces.
This work addresses some of these problems pertaining to the transparency and interpretability of such representations.
arXiv Detail & Related papers (2020-11-25T01:04:11Z) - Pragmatically Informative Color Generation by Grounding Contextual
Modifiers [14.394987796101349]
Given a reference color "green", and a modifier "bluey," how does one generate a color that could represent "bluey green"?
We propose a computational pragmatics model that formulates this color generation task as a recursive game between speakers and listeners.
In this paper, we show that pragmatic incorporating information provides significant improvements in performance compared with other state-of-the-art deep learning models.
arXiv Detail & Related papers (2020-10-09T04:54:54Z) - Understanding Spatial Relations through Multiple Modalities [78.07328342973611]
spatial relations between objects can either be explicit -- expressed as spatial prepositions, or implicit -- expressed by spatial verbs such as moving, walking, shifting, etc.
We introduce the task of inferring implicit and explicit spatial relations between two entities in an image.
We design a model that uses both textual and visual information to predict the spatial relations, making use of both positional and size information of objects and image embeddings.
arXiv Detail & Related papers (2020-07-19T01:35:08Z) - Semantic-driven Colorization [78.88814849391352]
Recent colorization works implicitly predict the semantic information while learning to colorize black-and-white images.
In this study, we simulate that human-like action to let our network first learn to understand the photo, then colorize it.
arXiv Detail & Related papers (2020-06-13T08:13:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.