DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of
mixture-of-datasets
- URL: http://arxiv.org/abs/2311.04894v1
- Date: Wed, 8 Nov 2023 18:55:24 GMT
- Title: DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of
mixture-of-datasets
- Authors: Yash Jain, Harkirat Behl, Zsolt Kira, Vibhav Vineet
- Abstract summary: We propose dataset-Aware Mixture-of-Experts, DAMEX.
We train the experts to become an expert' of a dataset by learning to route each dataset tokens to its mapped expert.
Experiments on Universal Object-Detection Benchmark show that we outperform the existing state-of-the-art.
- Score: 34.780870585656395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Construction of a universal detector poses a crucial question: How can we
most effectively train a model on a large mixture of datasets? The answer lies
in learning dataset-specific features and ensembling their knowledge but do all
this in a single model. Previous methods achieve this by having separate
detection heads on a common backbone but that results in a significant increase
in parameters. In this work, we present Mixture-of-Experts as a solution,
highlighting that MoEs are much more than a scalability tool. We propose
Dataset-Aware Mixture-of-Experts, DAMEX where we train the experts to become an
`expert' of a dataset by learning to route each dataset tokens to its mapped
expert. Experiments on Universal Object-Detection Benchmark show that we
outperform the existing state-of-the-art by average +10.2 AP score and improve
over our non-MoE baseline by average +2.0 AP score. We also observe consistent
gains while mixing datasets with (1) limited availability, (2) disparate
domains and (3) divergent label sets. Further, we qualitatively show that DAMEX
is robust against expert representation collapse.
Related papers
- Anno-incomplete Multi-dataset Detection [67.69438032767613]
We propose a novel problem as "-incomplete Multi-dataset Detection"
We develop an end-to-end multi-task learning architecture which can accurately detect all the object categories with multiple partially annotated datasets.
arXiv Detail & Related papers (2024-08-29T03:58:21Z) - BlendX: Complex Multi-Intent Detection with Blended Patterns [4.852816974803059]
We present BlendX, a suite of refined datasets featuring more diverse patterns than their predecessors.
For dataset construction, we utilize both rule-baseds and a generative tool -- OpenAI's ChatGPT -- which is augmented with a similarity-driven strategy for utterance selection.
Experiments on BlendX reveal that state-of-the-art MID models struggle with the challenges posed by the new datasets.
arXiv Detail & Related papers (2024-03-27T06:13:04Z) - ATEAM: Knowledge Integration from Federated Datasets for Vehicle Feature
Extraction using Annotation Team of Experts [1.947162363730401]
This paper proposes ATEAM, an annotation team-of-experts, to perform cross-dataset labeling and integration of disjoint annotation schemas.
We show that our integrated dataset can help off-the-shelf models achieve excellent accuracy on VMMR and vehicle re-id with no changes to model architectures.
arXiv Detail & Related papers (2022-11-16T18:33:45Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - ROAM: Random Layer Mixup for Semi-Supervised Learning in Medical Imaging [43.26668942258135]
Medical image segmentation is one of the major challenges addressed by machine learning methods.
We propose ROAM, a RandOm lAyer Mixup, which generates more data points that have never seen before.
ROAM achieves state-of-the-art (SOTA) results in fully supervised (89.5%) and semi-supervised (87.0%) settings with a relative improvement of up to 2.40% and 16.50%, respectively for the whole-brain segmentation.
arXiv Detail & Related papers (2020-03-20T18:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.