An Empirical Investigation into Benchmarking Model Multiplicity for
Trustworthy Machine Learning: A Case Study on Image Classification
- URL: http://arxiv.org/abs/2311.14859v1
- Date: Fri, 24 Nov 2023 22:30:38 GMT
- Title: An Empirical Investigation into Benchmarking Model Multiplicity for
Trustworthy Machine Learning: A Case Study on Image Classification
- Authors: Prakhar Ganesh
- Abstract summary: This paper offers a one-stop empirical benchmark of multiplicity across various dimensions of model design.
We also develop a framework, which we call multiplicity sheets, to benchmark multiplicity in various scenarios.
We show that multiplicity persists in deep learning models even after enforcing additional specifications during model selection.
- Score: 0.8702432681310401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models have proven to be highly successful. Yet, their
over-parameterization gives rise to model multiplicity, a phenomenon in which
multiple models achieve similar performance but exhibit distinct underlying
behaviours. This multiplicity presents a significant challenge and necessitates
additional specifications in model selection to prevent unexpected failures
during deployment. While prior studies have examined these concerns, they focus
on individual metrics in isolation, making it difficult to obtain a
comprehensive view of multiplicity in trustworthy machine learning. Our work
stands out by offering a one-stop empirical benchmark of multiplicity across
various dimensions of model design and its impact on a diverse set of
trustworthy metrics. In this work, we establish a consistent language for
studying model multiplicity by translating several trustworthy metrics into
accuracy under appropriate interventions. We also develop a framework, which we
call multiplicity sheets, to benchmark multiplicity in various scenarios. We
demonstrate the advantages of our setup through a case study in image
classification and provide actionable insights into the impact and trends of
different hyperparameters on model multiplicity. Finally, we show that
multiplicity persists in deep learning models even after enforcing additional
specifications during model selection, highlighting the severity of
over-parameterization. The concerns of under-specification thus remain, and we
seek to promote a more comprehensive discussion of multiplicity in trustworthy
machine learning.
Related papers
- Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Multimodal CLIP Inference for Meta-Few-Shot Image Classification [0.0]
Multimodal foundation models like CLIP learn a joint (image, text) embedding.
This study demonstrates that combining modalities from CLIP's text and image encoders outperforms state-of-the-art meta-few-shot learners on widely adopted benchmarks.
arXiv Detail & Related papers (2024-03-26T17:47:54Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - Multi-View Conformal Learning for Heterogeneous Sensor Fusion [0.12086712057375555]
We build and test multi-view and single-view conformal models for heterogeneous sensor fusion.
Our models provide theoretical marginal confidence guarantees since they are based on the conformal prediction framework.
Our results also showed that multi-view models generate prediction sets with less uncertainty compared to single-view models.
arXiv Detail & Related papers (2024-02-19T17:30:09Z) - Revealing Multimodal Contrastive Representation Learning through Latent
Partial Causal Models [85.67870425656368]
We introduce a unified causal model specifically designed for multimodal data.
We show that multimodal contrastive representation learning excels at identifying latent coupled variables.
Experiments demonstrate the robustness of our findings, even when the assumptions are violated.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - MultiViz: An Analysis Benchmark for Visualizing and Understanding
Multimodal Models [103.9987158554515]
MultiViz is a method for analyzing the behavior of multimodal models by scaffolding the problem of interpretability into 4 stages.
We show that the complementary stages in MultiViz together enable users to simulate model predictions, assign interpretable concepts to features, perform error analysis on model misclassifications, and use insights from error analysis to debug models.
arXiv Detail & Related papers (2022-06-30T18:42:06Z) - Deep Multistage Multi-Task Learning for Quality Prediction of Multistage
Manufacturing Systems [7.619217846525994]
We propose a deep multistage multi-task learning framework to jointly predict all output sensing variables in a unified end-to-end learning framework.
Our numerical studies and real case study have shown that the new model has a superior performance compared to many benchmark methods.
arXiv Detail & Related papers (2021-05-17T22:09:36Z) - Trusted Multi-View Classification [76.73585034192894]
We propose a novel multi-view classification method, termed trusted multi-view classification.
It provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
The proposed algorithm jointly utilizes multiple views to promote both classification reliability and robustness.
arXiv Detail & Related papers (2021-02-03T13:30:26Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.