Related papers: Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust

Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust

URL: http://arxiv.org/abs/2506.14799v1
Date: Mon, 02 Jun 2025 13:46:28 GMT
Title: Analyzing Character Representation in Media Content using Multimodal Foundation Model: Effectiveness and Trust
Authors: Evdoxia Taka, Debadyuti Bhattacharya, Joanne Garde-Hansen, Sanjay Sharma, Tanaya Guha,
Abstract summary: We ask, even if character distribution along demographic dimensions are available, how useful are they to the general public?<n>Our work addresses these questions through a user study, while proposing a new AI-based character representation and visualization tool.<n>Our tool based on the Contrastive Language Image Pretraining (CLIP) foundation model to analyze visual screen data to quantify character representation across dimensions of age and gender.
Score: 7.985473318714565
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in AI has enabled automated analysis of complex media content at scale and generate actionable insights regarding character representation along such dimensions as gender and age. Past work focused on quantifying representation from audio/video/text using various ML models, but without having the audience in the loop. We ask, even if character distribution along demographic dimensions are available, how useful are they to the general public? Do they actually trust the numbers generated by AI models? Our work addresses these questions through a user study, while proposing a new AI-based character representation and visualization tool. Our tool based on the Contrastive Language Image Pretraining (CLIP) foundation model to analyze visual screen data to quantify character representation across dimensions of age and gender. We also designed effective visualizations suitable for presenting such analytics to lay audience. Next, we conducted a user study to seek empirical evidence on the usefulness and trustworthiness of the AI-generated results for carefully chosen movies presented in the form of our visualizations. We note that participants were able to understand the analytics from our visualization, and deemed the tool `overall useful'. Participants also indicated a need for more detailed visualizations to include more demographic categories and contextual information of the characters. Participants' trust in AI-based gender and age models is seen to be moderate to low, although they were not against the use of AI in this context. Our tool including code, benchmarking, and data from the user study can be found here: https://anonymous.4open.science/r/Character-Representation-Media-FF7B

Related papers

Exploiting Contextual Uncertainty of Visual Data for Efficient Training of Deep Models [0.65268245109828]
We introduce the notion of contextual diversity for active learning CDAL. We propose a data repair algorithm to curate contextually fair data to reduce model bias. We are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads.
arXiv Detail & Related papers (2024-11-04T09:43:33Z)
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective. We use linear probes to estimate the mutual information between the target information and learned representations. We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z)
SeeBel: Seeing is Believing [0.9790236766474201]
We propose three visualizations that enable users to compare dataset statistics and AI performance for segmenting all images. Our project tries to further increase the interpretability of the trained AI model for segmentation by visualizing its image attention weights. We propose to conduct surveys on real users to study the efficacy of our visualization tool in computer vision and AI domain.
arXiv Detail & Related papers (2023-12-18T05:11:00Z)
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model [60.350851196619296]
We introduce an object-aware decoder for improving the performance of ego-centric representations on ego-centric videos. We show that the model can act as a drop-in replacement for an ego-awareness video model to improve performance through visual-text grounding.
arXiv Detail & Related papers (2023-08-15T17:58:11Z)
VisAlign: Dataset for Measuring the Degree of Alignment between AI and Humans in Visual Perception [32.376529738717736]
We propose a new dataset for measuring AI-human visual alignment in terms of image classification. Our dataset consists of three groups of samples, namely Must-Act (i.e., Must-Classify), Must-Abstain, and Uncertain. We analyze the visual alignment and reliability of five popular visual perception models and seven abstention methods.
arXiv Detail & Related papers (2023-08-03T04:04:03Z)
Towards Fair and Explainable AI using a Human-Centered AI Approach [5.888646114353372]
We present 5 research projects that aim to enhance explainability and fairness in classification systems and word embeddings. The first project explores the utility/downsides of introducing local model explanations as interfaces for machine teachers. The second project presents D-BIAS, a causality-based human-in-the-loop visual tool for identifying and mitigating social biases in datasets. The third project presents WordBias, a visual interactive tool that helps audit pre-trained static word embeddings for biases against groups. The fourth project presents DramatVis Personae, a visual analytics tool that helps identify social
arXiv Detail & Related papers (2023-06-12T21:08:55Z)
Learning Transferable Pedestrian Representation from Multimodal Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information. We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations. We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z)
Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models. By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes. We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z)
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense [98.70218717851665]
It is unclear whether the models really understand the visual scene and underlying commonsense knowledge due to limited evaluation data resources. We present a Multimodal Evaluation (ME) pipeline to automatically generate question-answer pairs to test models' understanding of the visual scene, text, and related knowledge. We then take a step further to show that training with the ME data boosts the model's performance in standard VCR evaluation.
arXiv Detail & Related papers (2022-11-10T21:44:33Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Data Representativity for Machine Learning and AI Systems [2.588973722689844]
Data representativity is crucial when drawing inference from data through machine learning models. This paper analyzes data representativity in scientific literature related to AI and sampling.
arXiv Detail & Related papers (2022-03-09T13:34:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.