Families In Wild Multimedia: A Multimodal Database for Recognizing
Kinship
- URL: http://arxiv.org/abs/2007.14509v6
- Date: Fri, 1 Oct 2021 20:16:01 GMT
- Title: Families In Wild Multimedia: A Multimodal Database for Recognizing
Kinship
- Authors: Joseph P. Robinson, Zaid Khan, Yu Yin, Ming Shao, Yun Fu
- Abstract summary: We introduce the first publicly available multi-task MM kinship dataset.
To build FIW MM, we developed machinery to automatically collect, annotate, and prepare the data.
Results highlight edge cases to inspire future research with different areas of improvement.
- Score: 63.27052967981546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kinship, a soft biometric detectable in media, is fundamental for a myriad of
use-cases. Despite the difficulty of detecting kinship, annual data challenges
using still-images have consistently improved performances and attracted new
researchers. Now, systems reach performance levels unforeseeable a decade ago,
closing in on performances acceptable to deploy in practice. Like other
biometric tasks, we expect systems can receive help from other modalities. We
hypothesize that adding modalities to FIW, which has only still-images, will
improve performance. Thus, to narrow the gap between research and reality and
enhance the power of kinship recognition systems, we extend FIW with multimedia
(MM) data (i.e., video, audio, and text captions). Specifically, we introduce
the first publicly available multi-task MM kinship dataset. To build FIW MM, we
developed machinery to automatically collect, annotate, and prepare the data,
requiring minimal human input and no financial cost. The proposed MM corpus
allows the problem statements to be more realistic template-based protocols. We
show significant improvements in all benchmarks with the added modalities. The
results highlight edge cases to inspire future research with different areas of
improvement. FIW MM supplies the data needed to increase the potential of
automated systems to detect kinship in MM. It also allows experts from diverse
fields to collaborate in novel ways.
Related papers
- MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models [71.36392373876505]
We introduce MMIE, a large-scale benchmark for evaluating interleaved multimodal comprehension and generation in Large Vision-Language Models (LVLMs)
MMIE comprises 20K meticulously curated multimodal queries, spanning 3 categories, 12 fields, and 102 subfields, including mathematics, coding, physics, literature, health, and arts.
It supports both interleaved inputs and outputs, offering a mix of multiple-choice and open-ended question formats to evaluate diverse competencies.
arXiv Detail & Related papers (2024-10-14T04:15:00Z) - MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines [91.08394877954322]
Large Multimodal Models (LMMs) have made impressive strides in AI search engines.
But, whether they can function as AI search engines remains under-explored.
We first design a delicate pipeline, MMSearch-Engine, to empower any LMMs with multimodal search capabilities.
arXiv Detail & Related papers (2024-09-19T17:59:45Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Exploring Fusion Techniques in Multimodal AI-Based Recruitment: Insights from FairCVdb [4.420073761023326]
We investigate the fairness and bias implications of multimodal fusion techniques in the context of multimodal AI-based recruitment systems.
Our results show that early-fusion closely matches the ground truth for both demographics, achieving the lowest MAEs.
In contrast, late-fusion leads to highly generalized mean scores and higher MAEs.
arXiv Detail & Related papers (2024-06-17T12:37:58Z) - MuJo: Multimodal Joint Feature Space Learning for Human Activity Recognition [2.7532797256542403]
Human Activity Recognition (HAR) is a longstanding problem in AI with applications in a broad range of areas, including healthcare, sports and fitness, security, and more.
We introduce our comprehensive Fitness Multimodal Activity dataset (FiMAD) to enhance HAR performance across various modalities.
We show that classifiers pre-trained on FiMAD can increase the performance on real HAR datasets such as MM-Fit, MyoGym, MotionSense, and MHEALTH.
arXiv Detail & Related papers (2024-06-06T08:42:36Z) - Exploring the Capabilities of Large Multimodal Models on Dense Text [58.82262549456294]
We propose the DT-VQA dataset, with 170k question-answer pairs.
In this paper, we conduct a comprehensive evaluation of GPT4V, Gemini, and various open-source LMMs.
We find that even with automatically labeled training datasets, significant improvements in model performance can be achieved.
arXiv Detail & Related papers (2024-05-09T07:47:25Z) - Exploring Missing Modality in Multimodal Egocentric Datasets [89.76463983679058]
We introduce a novel concept -Missing Modality Token (MMT)-to maintain performance even when modalities are absent.
Our method mitigates the performance loss, reducing it from its original $sim 30%$ drop to only $sim 10%$ when half of the test set is modal-incomplete.
arXiv Detail & Related papers (2024-01-21T11:55:42Z) - VERITE: A Robust Benchmark for Multimodal Misinformation Detection
Accounting for Unimodal Bias [17.107961913114778]
multimodal misinformation is a growing problem on social media platforms.
In this study, we investigate and identify the presence of unimodal bias in widely-used MMD benchmarks.
We introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data.
arXiv Detail & Related papers (2023-04-27T12:28:29Z) - MELINDA: A Multimodal Dataset for Biomedical Experiment Method
Classification [14.820951153262685]
We introduce a new dataset, MELINDA, for Multimodal biomEdicaL experImeNt methoD clAssification.
The dataset is collected in a fully automated distant supervision manner, where the labels are obtained from an existing curated database.
We benchmark various state-of-the-art NLP and computer vision models, including unimodal models which only take either caption texts or images as inputs.
arXiv Detail & Related papers (2020-12-16T19:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.