Bridging the Digital Divide: Performance Variation across Socio-Economic
Factors in Vision-Language Models
- URL: http://arxiv.org/abs/2311.05746v1
- Date: Thu, 9 Nov 2023 21:10:52 GMT
- Title: Bridging the Digital Divide: Performance Variation across Socio-Economic
Factors in Vision-Language Models
- Authors: Joan Nwatu, Oana Ignat, Rada Mihalcea
- Abstract summary: We evaluate the performance of a vision-language model (CLIP) on a geo-diverse dataset containing household images associated with different income values.
Our results indicate that performance for the poorer groups is consistently lower than the wealthier groups across various topics and countries.
- Score: 31.868468221653025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the impressive performance of current AI models reported across
various tasks, performance reports often do not include evaluations of how
these models perform on the specific groups that will be impacted by these
technologies. Among the minority groups under-represented in AI, data from
low-income households are often overlooked in data collection and model
evaluation. We evaluate the performance of a state-of-the-art vision-language
model (CLIP) on a geo-diverse dataset containing household images associated
with different income values (Dollar Street) and show that performance
inequality exists among households of different income levels. Our results
indicate that performance for the poorer groups is consistently lower than the
wealthier groups across various topics and countries. We highlight insights
that can help mitigate these issues and propose actionable steps for
economic-level inclusive AI development. Code is available at
https://github.com/MichiganNLP/Bridging_the_Digital_Divide.
Related papers
- Culture Affordance Atlas: Reconciling Object Diversity Through Functional Mapping [38.345727498425]
Vision-Language (VL) datasets exhibit cultural biases, disproportionately favoring higher-income, Western contexts.<n>We propose a novel function-centric framework that categorizes objects by the functions they fulfill, across diverse cultural and economic contexts.
arXiv Detail & Related papers (2025-12-02T19:16:39Z) - AI-EDI-SPACE: A Co-designed Dataset for Evaluating the Quality of Public Spaces [2.691611484444756]
Crowdsourcing often employs low-wage workers with poor working conditions and lacks consideration for the representativeness of annotators.
We propose a methodology involving a co-design model that actively engages stakeholders at key stages, integrating principles of Equity, Diversity, and Inclusion (EDI) to ensure diverse viewpoints.
We apply this methodology to develop a dataset and AI model for evaluating public space quality using street view images.
arXiv Detail & Related papers (2024-11-01T18:11:29Z) - LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models [28.3552578648979]
We propose and evaluate several prompting strategies using non-English, geographic, and socioeconomic attributes.
We show that these geographic and socioeconomic integrated prompts favor retrieving topic appearances commonly found in data from low-income households across different countries.
arXiv Detail & Related papers (2024-07-02T19:27:00Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Pinpointing Why Object Recognition Performance Degrades Across Income
Levels and Geographies [8.408398153073096]
Deep learning systems' performance degrades significantly across geographies and lower income levels.
We take a step in this direction by annotating images from Dollar Street, a popular benchmark of geographically and economically diverse images.
These annotations unlock a new granular view into how objects differ across incomes and regions.
We then use these object differences to pinpoint model vulnerabilities across incomes and regions.
arXiv Detail & Related papers (2023-04-11T17:59:52Z) - Towards Reliable Assessments of Demographic Disparities in Multi-Label
Image Classifiers [11.973749734226852]
We consider multi-label image classification and, specifically, object categorization tasks.
Design choices and trade-offs for measurement involve more nuance than discussed in prior computer vision literature.
We identify several design choices that look merely like implementation details but significantly impact the conclusions of assessments.
arXiv Detail & Related papers (2023-02-16T20:34:54Z) - On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.
We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z) - Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations.
We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z) - CHEER: Rich Model Helps Poor Model via Knowledge Infusion [69.23072792708263]
We develop a knowledge infusion framework named CHEER that can succinctly summarize such rich model into transferable representations.
Our empirical results showed that CHEER outperformed baselines by 5.60% to 46.80% in terms of the macro-F1 score on multiple physiological datasets.
arXiv Detail & Related papers (2020-05-21T21:44:21Z) - Inclusive GAN: Improving Data and Minority Coverage in Generative Models [101.67587566218928]
We formalize the problem of minority inclusion as one of data coverage.
We then propose to improve data coverage by harmonizing adversarial training with reconstructive generation.
We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include.
arXiv Detail & Related papers (2020-04-07T13:31:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.