Examining The CoVCues Dataset: Supporting COVID Infodemic Research Through A Novel User Assessment Study
- URL: http://arxiv.org/abs/2602.00055v1
- Date: Mon, 19 Jan 2026 20:16:37 GMT
- Title: Examining The CoVCues Dataset: Supporting COVID Infodemic Research Through A Novel User Assessment Study
- Authors: Shreetika Poudel, Ankur Chatterjee,
- Abstract summary: We have created a novel dataset called CoVCues that represents a varied set of image artifacts.<n>We have conducted a preliminary user assessment study to determine how effectively these dataset images contribute to the user perceived information reliability.<n>The findings from this study offer valuable feedback for refining our CoVCues dataset and for supporting our claim that visual cues are underutilized but useful in combating the COVID infodemic.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The public confidence and trust in online healthcare information have been greatly dented following the COVID-19 pandemic, which triggered a significant rise in online health misinformation. Existing literature shows that different datasets have been created to aid with detecting false information associated with this COVID infodemic. However, most of these datasets contain mostly unimodal data, which comprise primarily textual cues, and not visual cues, like images, infographics, and other graphic data components. Prior works point to the fact that there are only a handful of multimodal datasets that support COVID misinformation identification, and they lack an organized, processed and analyzed repository of visual cues. The novel CoVCues dataset, which represents a varied set of image artifacts, addresses this gap and advocates for the use of visual cues towards detecting online health misinformation. As part of validating the contents and utility of our CoVCues dataset, we have conducted a preliminary user assessment study, where different participants have been surveyed through a set of questionnaires to determine how effectively these dataset images contribute to the user perceived information reliability. These survey responses helped provide early insights into how different stakeholder groups interpret visual cues in the context of online health information and communication. The findings from this novel user assessment study offer valuable feedback for refining our CoVCues dataset and for supporting our claim that visual cues are underutilized but useful in combating the COVID infodemic. To our knowledge, this user assessment research study, as described in this paper, is the first of its kind work, involving COVID visual cues, that demonstrates the important role that our CoVCues dataset can potentially play in aiding COVID infodemic related future research work.
Related papers
- Privacy-Aware, Public-Aligned: Embedding Risk Detection and Public Values into Scalable Clinical Text De-Identification for Trusted Research Environments [0.0]
We show how direct and indirect identifiers vary by record type, clinical setting, and data flow, and show how changes in documentation practice can degrade model performance over time.<n>Our findings highlight that privacy risk is context-dependent and cumulative, underscoring the need for adaptable, hybrid de-identification approaches.
arXiv Detail & Related papers (2025-06-01T17:45:57Z) - A Survey on Side Information-driven Session-based Recommendation: From a Data-centric Perspective [49.68029601454934]
Session-based recommendation is gaining increasing attention due to its practical value in predicting intents of anonymous users.<n>The core of side information-driven session-based recommendation is the discovery and utilization of diverse data.<n>In this survey, we provide a comprehensive review of this task from a data-centric perspective.
arXiv Detail & Related papers (2025-05-18T07:36:43Z) - In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review [18.178774133733686]
We propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications.<n>We discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle.
arXiv Detail & Related papers (2025-01-18T11:03:59Z) - Visual Data Diagnosis and Debiasing with Concept Graphs [50.84781894621378]
We present ConBias, a framework for diagnosing and mitigating Concept co-occurrence Biases in visual datasets.
We show that by employing a novel clique-based concept balancing strategy, we can mitigate these imbalances, leading to enhanced performance on downstream tasks.
arXiv Detail & Related papers (2024-09-26T16:59:01Z) - Fake News Detection: It's All in the Data! [0.06749750044497731]
The survey meticulously outlines the key features of datasets, various labeling systems employed, and prevalent biases that can impact model performance.<n> GitHub repository consolidates publicly accessible datasets into a single, user-friendly portal.
arXiv Detail & Related papers (2024-07-02T10:12:06Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Can Pre-trained Vision and Language Models Answer Visual
Information-Seeking Questions? [50.29862466940209]
We introduce InfoSeek, a visual question answering dataset tailored for information-seeking questions.
We analyze various pre-trained visual question answering models and gain insights into their characteristics.
We show that accurate visual entity recognition can be used to improve performance on InfoSeek by retrieving relevant documents.
arXiv Detail & Related papers (2023-02-23T00:33:54Z) - Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions.
We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity.
Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z) - A Survey on Bias in Visual Datasets [17.79365832663837]
Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks.
CV systems highly depend on the data they are fed with and can learn and amplify biases within such data.
Yet, to date there is no comprehensive survey on bias in visual datasets.
arXiv Detail & Related papers (2021-07-16T14:16:52Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Measuring Data Collection Diligence for Community Healthcare [23.612133021992868]
Non-diligent data collection by community health workers (CHWs) is a significant challenge in developing countries.
In this work, we define and test a data collection diligence score.
Our framework has been validated on the ground using observations by the field monitors of our partner NGO in India.
arXiv Detail & Related papers (2020-11-05T16:45:03Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.