A Large-Scale Multilingual Study of Visual Constraints on Linguistic
Selection of Descriptions
- URL: http://arxiv.org/abs/2302.04811v1
- Date: Thu, 9 Feb 2023 17:57:58 GMT
- Title: A Large-Scale Multilingual Study of Visual Constraints on Linguistic
Selection of Descriptions
- Authors: Uri Berger, Lea Frermann, Gabriel Stanovsky, Omri Abend
- Abstract summary: We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals.
We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions.
- Score: 35.82822305925811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a large, multilingual study into how vision constrains linguistic
choice, covering four languages and five linguistic properties, such as verb
transitivity or use of numerals. We propose a novel method that leverages
existing corpora of images with captions written by native speakers, and apply
it to nine corpora, comprising 600k images and 3M captions. We study the
relation between visual input and linguistic choices by training classifiers to
predict the probability of expressing a property from raw images, and find
evidence supporting the claim that linguistic properties are constrained by
visual context across languages. We complement this investigation with a corpus
study, taking the test case of numerals. Specifically, we use existing
annotations (number or type of objects) to investigate the effect of different
visual conditions on the use of numeral expressions in captions, and show that
similar patterns emerge across languages. Our methods and findings both confirm
and extend existing research in the cognitive literature. We additionally
discuss possible applications for language generation.
Related papers
- Multimodal Modeling For Spoken Language Identification [57.94119986116947]
Spoken language identification refers to the task of automatically predicting the spoken language in a given utterance.
We propose MuSeLI, a Multimodal Spoken Language Identification method, which delves into the use of various metadata sources to enhance language identification.
arXiv Detail & Related papers (2023-09-19T12:21:39Z) - Multilingual Multi-Figurative Language Detection [14.799109368073548]
figurative language understanding is highly understudied in a multilingual setting.
We introduce multilingual multi-figurative language modelling, and provide a benchmark for sentence-level figurative language detection.
We develop a framework for figurative language detection based on template-based prompt learning.
arXiv Detail & Related papers (2023-05-31T18:52:41Z) - Contextual Object Detection with Multimodal Large Language Models [66.15566719178327]
We introduce a novel research problem of contextual object detection.
Three representative scenarios are investigated, including the language cloze test, visual captioning, and question answering.
We present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts.
arXiv Detail & Related papers (2023-05-29T17:50:33Z) - Exploring Affordance and Situated Meaning in Image Captions: A
Multimodal Analysis [1.124958340749622]
We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Cue Gazeing, and Ecological Niche Association (ENA)
Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance.
arXiv Detail & Related papers (2023-05-24T01:30:50Z) - Corpus-Guided Contrast Sets for Morphosyntactic Feature Detection in
Low-Resource English Varieties [3.3536302616846734]
We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits.
We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.
arXiv Detail & Related papers (2022-09-15T21:19:31Z) - From Two to One: A New Scene Text Recognizer with Visual Language
Modeling Network [70.47504933083218]
We propose a Visual Language Modeling Network (VisionLAN), which views the visual and linguistic information as a union.
VisionLAN significantly improves the speed by 39% and adaptively considers the linguistic information to enhance the visual features for accurate recognition.
arXiv Detail & Related papers (2021-08-22T07:56:24Z) - Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken
Language Learners [11.190877290770047]
We present a multilingual end-to-end Text-To-Speech framework that maps byte inputs to spectrograms, thus allowing arbitrary input scripts.
The framework demonstrates capabilities to adapt to various new languages under extreme low-resource scenarios.
We propose a novel method to extract language-specific sub-networks for a better understanding of the mechanism of multilingual models.
arXiv Detail & Related papers (2021-03-05T08:41:45Z) - Rediscovering the Slavic Continuum in Representations Emerging from
Neural Models of Spoken Language Identification [16.369477141866405]
We present a neural model for Slavic language identification in speech signals.
We analyze its emergent representations to investigate whether they reflect objective measures of language relatedness.
arXiv Detail & Related papers (2020-10-22T18:18:19Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.