Related papers: A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok

A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok

URL: http://arxiv.org/abs/2403.19717v1
Date: Wed, 27 Mar 2024 17:46:14 GMT
Title: A Picture is Worth 500 Labels: A Case Study of Demographic Disparities in Local Machine Learning Models for Instagram and TikTok
Authors: Jack West, Lea Thiemt, Shimaa Ahmed, Maggie Bartig, Kassem Fawaz, Suman Banerjee,
Abstract summary: We analyze two popular social media apps, TikTok and Instagram, to reveal what insights vision models in both apps infer about users from their image and video data. We develop a novel method for capturing and evaluating machine learning tasks in mobile apps.
Score: 9.917627395559467
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Mobile apps have embraced user privacy by moving their data processing to the user's smartphone. Advanced machine learning (ML) models, such as vision models, can now locally analyze user images to extract insights that drive several functionalities. Capitalizing on this new processing model of locally analyzing user images, we analyze two popular social media apps, TikTok and Instagram, to reveal (1) what insights vision models in both apps infer about users from their image and video data and (2) whether these models exhibit performance disparities with respect to demographics. As vision models provide signals for sensitive technologies like age verification and facial recognition, understanding potential biases in these models is crucial for ensuring that users receive equitable and accurate services. We develop a novel method for capturing and evaluating ML tasks in mobile apps, overcoming challenges like code obfuscation, native code execution, and scalability. Our method comprises ML task detection, ML pipeline reconstruction, and ML performance assessment, specifically focusing on demographic disparities. We apply our methodology to TikTok and Instagram, revealing significant insights. For TikTok, we find issues in age and gender prediction accuracy, particularly for minors and Black individuals. In Instagram, our analysis uncovers demographic disparities in the extraction of over 500 visual concepts from images, with evidence of spurious correlations between demographic features and certain concepts.

Related papers

Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust [7.985473318714565]
We ask, even if character distribution along demographic dimensions are available, how useful are they to the general public?<n>Our work addresses these questions through a user study, while proposing a new AI-based character representation and visualization tool.<n>Our tool based on the Contrastive Language Image Pretraining (CLIP) foundation model to analyze visual screen data to quantify character representation across dimensions of age and gender.
arXiv Detail & Related papers (2025-06-02T13:46:28Z)
MIMRS: A Survey on Masked Image Modeling in Remote Sensing [12.28883063656968]
Masked Image Modeling (MIM) is a self-supervised learning technique that involves masking portions of an image. MIM addresses challenges such as incomplete data caused by cloud cover, occlusions, and sensor limitations. This survey (MIMRS) is a pioneering effort to chart the landscape of mask image modeling in remote sensing.
arXiv Detail & Related papers (2025-04-04T05:16:51Z)
Towards Understanding Graphical Perception in Large Multimodal Models [80.44471730672801]
We leverage the theory of graphical perception to develop an evaluation framework for analyzing gaps in LMMs' perception abilities in charts. We apply our framework to evaluate and diagnose the perception capabilities of state-of-the-art LMMs at three levels (chart, visual element, and pixel)
arXiv Detail & Related papers (2025-03-13T20:13:39Z)
"Impressively Scary:" Exploring User Perceptions and Reactions to Unraveling Machine Learning Models in Social Media Applications [17.961040981236092]
We aim to investigate how social media user perceptions and behaviors change once exposed to machine learning models. We conducted user studies (N=21) and found that participants were unaware to both what the models output and when the models were used in Instagram and TikTok. In response to being exposed to the models' functionality, we observed long term behavior changes in 8 participants.
arXiv Detail & Related papers (2025-03-05T21:51:52Z)
Face-MLLM: A Large Face Perception Model [53.9441375205716]
multimodal large language models (MLLMs) have achieved promising results on a wide range of vision-language tasks, but their ability to perceive and understand human faces is rarely explored. In this work, we comprehensively evaluate existing MLLMs on face perception tasks. Our model surpasses previous MLLMs on five famous face perception tasks.
arXiv Detail & Related papers (2024-10-28T04:19:32Z)
Vision-Language Models under Cultural and Inclusive Considerations [53.614528867159706]
Large vision-language models (VLMs) can assist visually impaired people by describing images from their daily lives. Current evaluation datasets may not reflect diverse cultural user backgrounds or the situational context of this use case. We create a survey to determine caption preferences and propose a culture-centric evaluation benchmark by filtering VizWiz, an existing dataset with images taken by people who are blind. We then evaluate several VLMs, investigating their reliability as visual assistants in a culturally diverse setting.
arXiv Detail & Related papers (2024-07-08T17:50:00Z)
Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs) We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z)
Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models [31.69213233651326]
We introduce the novel task of Visual Data-Type Identification. An extensive zero-shot evaluation of 39 vision-language models (VLMs) shows a nuanced performance landscape.
arXiv Detail & Related papers (2023-10-12T17:59:30Z)
FACET: Fairness in Computer Vision Evaluation Benchmark [21.862644380063756]
Computer vision models have known performance disparities across attributes such as gender and skin tone. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion) FACET is a large, publicly available evaluation set of 32k images for some of the most common vision tasks.
arXiv Detail & Related papers (2023-08-31T17:59:48Z)
MiVOLO: Multi-input Transformer for Age and Gender Estimation [0.0]
We present MiVOLO, a straightforward approach for age and gender estimation using the latest vision transformer. Our method integrates both tasks into a unified dual input/output model. We compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges.
arXiv Detail & Related papers (2023-07-10T14:58:10Z)
Evaluating how interactive visualizations can assist in finding samples where and how computer vision models make mistakes [1.76602679361245]
We present two interactive visualizations in the context of Sprite, a system for creating Computer Vision (CV) models. We study how these visualizations help Sprite's users identify (evaluate) and select (plan) images where a model is struggling and can lead to improved performance.
arXiv Detail & Related papers (2023-05-19T14:43:00Z)
Robustar: Interactive Toolbox Supporting Precise Data Annotation for Robust Vision Learning [53.900911121695536]
We introduce the initial release of our software Robustar. It aims to improve the robustness of vision classification machine learning models through a data-driven perspective.
arXiv Detail & Related papers (2022-07-18T21:12:28Z)
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models. First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding. Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)
Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation. We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data. Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.