Three Laws of Statistical Linguistics Emerging in images
- URL: http://arxiv.org/abs/2501.18620v1
- Date: Sun, 26 Jan 2025 16:26:32 GMT
- Title: Three Laws of Statistical Linguistics Emerging in images
- Authors: Ping-Rui Tsai, Chi-hsiang Wang, Yu-Cheng Liao, Tzay-Ming Hong,
- Abstract summary: We use the VGG-19 to define words via each kernel and calculate the number of pixels with grayscale values greater than 90%.
We are surprised to find that Zipf's, Heaps', and Benford's laws of statistical linguistics also exist in the words that comprises the text representing different images.
- Score: 0.0
- License:
- Abstract: Images, as a product evolving alongside civilization, develop similarly to natural languages with the advancement of civilization. Not only are images abundant in daily life, but are also influenced by technology in shaping their forms, embodying various characteristics as they evolve in time. Language is a sequence of symbols that represents thoughts. While a written language is typically associated with the close integration of text and sound, as a combination of visual symbols and perception, the communicative power of image is no less significant. This is especially notable since 60% of the sensory input received by our central nervous system comes from vision. Given the symbolic system inherent in images, we are curious whether images can also exhibit the laws of statistical linguistics. To explore this, we begin with the relationship between human thought and visual perception to decode how images are formed by the latter mechanism. Building upon previous studies that established the high correlation between pre-trained deep convolutional neural networks and the human visual system, we use the VGG-19 to define words via each kernel and calculate the number of pixels with grayscale values greater than 90%. By (a) ranking words frequency, (b) randomizing the order of kernel appearances and performing the same word count accumulation, and (c) summing the word counts layer by layer, we are surprised to find that Zipf's, Heaps', and Benford's laws of statistical linguistics also exist in the words that comprises the text representing different images.
Related papers
- Probing the contents of semantic representations from text, behavior, and brain data using the psychNorms metabase [0.0]
We evaluate the similarities and differences between semantic representations derived from text, behavior, and brain data.
We establish behavior as an important complement to text for capturing human representations and behavior.
arXiv Detail & Related papers (2024-12-06T10:44:20Z) - Compositional Entailment Learning for Hyperbolic Vision-Language Models [54.41927525264365]
We show how to fully leverage the innate hierarchical nature of hyperbolic embeddings by looking beyond individual image-text pairs.
We propose Compositional Entailment Learning for hyperbolic vision-language models.
Empirical evaluation on a hyperbolic vision-language model trained with millions of image-text pairs shows that the proposed compositional learning approach outperforms conventional Euclidean CLIP learning.
arXiv Detail & Related papers (2024-10-09T14:12:50Z) - Computer Vision Datasets and Models Exhibit Cultural and Linguistic
Diversity in Perception [28.716435050743957]
We study how people from different cultural backgrounds observe vastly different concepts even when viewing the same visual stimuli.
By comparing textual descriptions generated across 7 languages for the same images, we find significant differences in the semantic content and linguistic expression.
Our work points towards the need to accounttuning for and embrace the diversity of human perception in the computer vision community.
arXiv Detail & Related papers (2023-10-22T16:51:42Z) - Multimodal Neurons in Pretrained Text-Only Transformers [52.20828443544296]
We identify "multimodal neurons" that convert visual representations into corresponding text.
We show that multimodal neurons operate on specific visual concepts across inputs, and have a systematic causal effect on image captioning.
arXiv Detail & Related papers (2023-08-03T05:27:12Z) - Exploring Affordance and Situated Meaning in Image Captions: A
Multimodal Analysis [1.124958340749622]
We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Cue Gazeing, and Ecological Niche Association (ENA)
Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance.
arXiv Detail & Related papers (2023-05-24T01:30:50Z) - Universal Multimodal Representation for Language Understanding [110.98786673598015]
This work presents new methods to employ visual information as assistant signals to general NLP tasks.
For each sentence, we first retrieve a flexible number of images either from a light topic-image lookup table extracted over the existing sentence-image pairs.
Then, the text and images are encoded by a Transformer encoder and convolutional neural network, respectively.
arXiv Detail & Related papers (2023-01-09T13:54:11Z) - Comprehending and Ordering Semantics for Image Captioning [124.48670699658649]
We propose a new recipe of Transformer-style structure, namely Comprehending and Ordering Semantics Networks (COS-Net)
COS-Net unifies an enriched semantic comprehending and a learnable semantic ordering processes into a single architecture.
arXiv Detail & Related papers (2022-06-14T15:51:14Z) - Evaluating language-biased image classification based on semantic
representations [13.508894957080777]
Humans show language-biased image recognition for a word-embedded image, known as picture-word interference.
Similar to humans, recent artificial models jointly trained on texts and images, e.g., OpenAI CLIP, show language-biased image classification.
arXiv Detail & Related papers (2022-01-26T15:46:36Z) - Emergence of Machine Language: Towards Symbolic Intelligence with Neural
Networks [73.94290462239061]
We propose to combine symbolism and connectionism principles by using neural networks to derive a discrete representation.
By designing an interactive environment and task, we demonstrated that machines could generate a spontaneous, flexible, and semantic language.
arXiv Detail & Related papers (2022-01-14T14:54:58Z) - Structural-analogy from a Single Image Pair [118.61885732829117]
In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B.
We generate an image that keeps the appearance and style of B, but has a structural arrangement that corresponds to A.
Our method can be used to generate high quality imagery in other conditional generation tasks utilizing images A and B only.
arXiv Detail & Related papers (2020-04-05T14:51:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.