Related papers: What You See is (Usually) What You Get: Multimodal Prototype Networks that Abstain from Expensive Modalities

What You See is (Usually) What You Get: Multimodal Prototype Networks that Abstain from Expensive Modalities

URL: http://arxiv.org/abs/2511.19752v1
Date: Mon, 24 Nov 2025 22:17:24 GMT
Title: What You See is (Usually) What You Get: Multimodal Prototype Networks that Abstain from Expensive Modalities
Authors: Muchang Bahng, Charlie Berens, Jon Donnelly, Eric Chen, Chaofan Chen, Cynthia Rudin,
Abstract summary: Multimodal neural networks have seen increasing use for identifying species to help automate this task.<n>They have two major drawbacks. First, their black-box nature prevents the interpretability of their decision making process.<n>Second, collecting genetic data is often expensive and requires invasive procedures, often necessitating researchers to capture or kill the target specimen.<n>We address both of these problems by extending prototype networks (ProtoPNets), which are a popular and interpretable alternative to traditional neural networks, to the multimodal, cost-aware setting.
Score: 30.3982695067087
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Species detection is important for monitoring the health of ecosystems and identifying invasive species, serving a crucial role in guiding conservation efforts. Multimodal neural networks have seen increasing use for identifying species to help automate this task, but they have two major drawbacks. First, their black-box nature prevents the interpretability of their decision making process. Second, collecting genetic data is often expensive and requires invasive procedures, often necessitating researchers to capture or kill the target specimen. We address both of these problems by extending prototype networks (ProtoPNets), which are a popular and interpretable alternative to traditional neural networks, to the multimodal, cost-aware setting. We ensemble prototypes from each modality, using an associated weight to determine how much a given prediction relies on each modality. We further introduce methods to identify cases for which we do not need the expensive genetic information to make confident predictions. We demonstrate that our approach can intelligently allocate expensive genetic data for fine-grained distinctions while using abundant image data for clearer visual classifications and achieving comparable accuracy to models that consistently use both modalities.

Related papers

Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [81.93945602120453]
We introduce an approach that is both general and parameter-efficient for face forgery detection.<n>We design a forgery-style mixture formulation that augments the diversity of forgery source domains.<n>We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks [19.639533220155965]
This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic.
arXiv Detail & Related papers (2023-11-18T02:44:33Z)
Predicting Biomedical Interactions with Probabilistic Model Selection for Graph Neural Networks [5.156812030122437]
Current biological networks are noisy, sparse, and incomplete. Experimental identification of such interactions is both time-consuming and expensive. Deep graph neural networks have shown their effectiveness in modeling graph-structured data and achieved good performance in biomedical interaction prediction. Our proposed method enables the graph convolutional networks to dynamically adapt their depths to accommodate an increasing number of interactions.
arXiv Detail & Related papers (2022-11-22T20:44:28Z)
Two-stage Modeling for Prediction with Confidence [0.0]
It is difficult to generalize the performance of neural networks under the condition of distributional shift. We propose a novel two-stage model for the potential distribution shift problem. We show that our model offers reliable predictions for the vast majority of datasets.
arXiv Detail & Related papers (2022-09-19T08:48:07Z)
Image-based Automated Species Identification: Can Virtual Data Augmentation Overcome Problems of Insufficient Sampling? [0.0]
We present a two-level data augmentation approach to automated visual species identification. The first level of data augmentation applies classic approaches of data augmentation and generation of faked images. The second level of data augmentation employs synthetic additional sampling in feature space by an oversampling algorithm in vector space.
arXiv Detail & Related papers (2020-10-18T15:44:45Z)
Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization [59.5104563755095]
We introduce a simple but effective approach to improve the generalization capability of deep neural networks in the field of medical imaging classification. Motivated by the observation that the domain variability of the medical images is to some extent compact, we propose to learn a representative feature space through variational encoding.
arXiv Detail & Related papers (2020-09-27T12:30:30Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
M2Net: Multi-modal Multi-channel Network for Overall Survival Time Prediction of Brain Tumor Patients [151.4352001822956]
Early and accurate prediction of overall survival (OS) time can help to obtain better treatment planning for brain tumor patients. Existing prediction methods rely on radiomic features at the local lesion area of a magnetic resonance (MR) volume. We propose an end-to-end OS time prediction model; namely, Multi-modal Multi-channel Network (M2Net)
arXiv Detail & Related papers (2020-06-01T05:21:37Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.