GIVL: Improving Geographical Inclusivity of Vision-Language Models with
Pre-Training Methods
- URL: http://arxiv.org/abs/2301.01893v1
- Date: Thu, 5 Jan 2023 03:43:45 GMT
- Title: GIVL: Improving Geographical Inclusivity of Vision-Language Models with
Pre-Training Methods
- Authors: Da Yin, Feng Gao, Govind Thattai, Michael Johnston, Kai-Wei Chang
- Abstract summary: We propose GIVL, a Geographically Inclusive Vision-and-Language Pre-trained model.
There are two attributes of geo-diverse visual concepts which can help to learn geo-diverse knowledge: 1) concepts under similar categories have unique knowledge and visual characteristics, 2) concepts with similar visual features may fall in completely different categories.
Compared with similar-size models pre-trained with similar scale of data, GIVL achieves state-of-the-art (SOTA) and more balanced performance on geo-diverse V&L tasks.
- Score: 62.076647211744564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A key goal for the advancement of AI is to develop technologies that serve
the needs not just of one group but of all communities regardless of their
geographical region. In fact, a significant proportion of knowledge is locally
shared by people from certain regions but may not apply equally in other
regions because of cultural differences. If a model is unaware of regional
characteristics, it may lead to performance disparity across regions and result
in bias against underrepresented groups. We propose GIVL, a Geographically
Inclusive Vision-and-Language Pre-trained model. There are two attributes of
geo-diverse visual concepts which can help to learn geo-diverse knowledge: 1)
concepts under similar categories have unique knowledge and visual
characteristics, 2) concepts with similar visual features may fall in
completely different categories. Motivated by the attributes, we design new
pre-training objectives Image Knowledge Matching (IKM) and Image Edit Checking
(IEC) to pre-train GIVL. Compared with similar-size models pre-trained with
similar scale of data, GIVL achieves state-of-the-art (SOTA) and more balanced
performance on geo-diverse V&L tasks.
Related papers
- `Eyes of a Hawk and Ears of a Fox': Part Prototype Network for Generalized Zero-Shot Learning [47.1040786932317]
Current approaches in Generalized Zero-Shot Learning (GZSL) are built upon base models which consider only a single class attribute vector representation over the entire image.
We take a fundamentally different approach: a pre-trained Vision-Language detector (VINVL) sensitive to attribute information is employed to efficiently obtain region features.
A learned function maps the region features to region-specific attribute attention used to construct class part prototypes.
arXiv Detail & Related papers (2024-04-12T18:37:00Z) - Measuring Geographic Diversity of Foundation Models with a Natural Language--based Geo-guessing Experiment on GPT-4 [5.534517268996598]
We study GPT-4, a state-of-the-art representative in the family of multimodal large language models, to study its geographic diversity.
Using DBpedia abstracts as a ground-truth corpus for probing, our natural language-based geo-guessing experiment shows that GPT-4 may currently encode insufficient knowledge about several geographic feature types.
arXiv Detail & Related papers (2024-04-11T09:59:21Z) - Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition [24.701574433327746]
We investigate the feasibility of probing a large language model for geography-based object knowledge.
We propose geography knowledge regularization to ensure that soft prompts trained on a source set of geographies generalize to an unseen target set.
Accuracy gains over prompting baselines on DollarStreet are up to +2.8/1.2/1.6 on target data from Africa/Asia/Americas, and +4.6 overall on the hardest classes.
arXiv Detail & Related papers (2024-01-03T01:11:16Z) - GeoNet: Benchmarking Unsupervised Adaptation across Geographies [71.23141626803287]
We study the problem of geographic robustness and make three main contributions.
First, we introduce a large-scale dataset GeoNet for geographic adaptation.
Second, we hypothesize that the major source of domain shifts arise from significant variations in scene context.
Third, we conduct an extensive evaluation of several state-of-the-art unsupervised domain adaptation algorithms and architectures.
arXiv Detail & Related papers (2023-03-27T17:59:34Z) - Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning [49.04866469947569]
We construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense.
We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region.
arXiv Detail & Related papers (2021-09-14T17:52:55Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Adversarial Graph Representation Adaptation for Cross-Domain Facial
Expression Recognition [86.25926461936412]
We propose a novel Adrialversa Graph Representation Adaptation (AGRA) framework that unifies graph representation propagation with adversarial learning for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair experiments on several popular benchmarks and show that the proposed AGRA framework achieves superior performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T13:27:24Z) - Meta-Learning for Few-Shot Land Cover Classification [3.8529010979482123]
We evaluate the model-agnostic meta-learning (MAML) algorithm on classification and segmentation tasks.
We find that few-shot model adaptation outperforms pre-training with regular gradient descent.
This indicates that model optimization with meta-learning may benefit tasks in the Earth sciences.
arXiv Detail & Related papers (2020-04-28T09:42:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.