TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary
Data
- URL: http://arxiv.org/abs/2111.04798v2
- Date: Wed, 10 Nov 2021 15:33:24 GMT
- Title: TAGLETS: A System for Automatic Semi-Supervised Learning with Auxiliary
Data
- Authors: Wasu Piriyakulkij and Cristina Menghini and Ross Briden and Nihal V.
Nayak and Jeffrey Zhu and Elaheh Raisi and Stephen H. Bach
- Abstract summary: Machine learning practitioners often have access to a spectrum of data.
We describe TAGLETS, a system built to study techniques for automatically exploiting all three types of data.
- Score: 8.653321928148545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning practitioners often have access to a spectrum of data:
labeled data for the target task (which is often limited), unlabeled data, and
auxiliary data, the many available labeled datasets for other tasks. We
describe TAGLETS, a system built to study techniques for automatically
exploiting all three types of data and creating high-quality, servable
classifiers. The key components of TAGLETS are: (1) auxiliary data organized
according to a knowledge graph, (2) modules encapsulating different methods for
exploiting auxiliary and unlabeled data, and (3) a distillation stage in which
the ensembled modules are combined into a servable model. We compare TAGLETS
with state-of-the-art transfer learning and semi-supervised learning methods on
four image classification tasks. Our study covers a range of settings, varying
the amount of labeled data and the semantic relatedness of the auxiliary data
to the target task. We find that the intelligent incorporation of auxiliary and
unlabeled data into multiple learning techniques enables TAGLETS to match-and
most often significantly surpass-these alternatives. TAGLETS is available as an
open-source system at github.com/BatsResearch/taglets.
Related papers
- An Automatic Prompt Generation System for Tabular Data Tasks [3.117741687220381]
Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts.
This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training.
arXiv Detail & Related papers (2024-05-09T08:32:55Z) - Distribution Matching for Multi-Task Learning of Classification Tasks: a
Large-Scale Study on Faces & Beyond [62.406687088097605]
Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space.
We show that MTL can be successful with classification tasks with little, or non-overlapping annotations.
We propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching.
arXiv Detail & Related papers (2024-01-02T14:18:11Z) - Towards Heterogeneous Long-tailed Learning: Benchmarking, Metrics, and Toolbox [9.202606514025653]
Long-tailed data distributions pose challenges for a variety of domains like e-commerce, finance, biomedical science, and cyber security.
We develop HeroLT, a comprehensive long-tailed learning benchmark integrating 18 state-of-the-art algorithms, 10 evaluation metrics, and 17 real-world datasets across 6 tasks and 4 data modalities.
arXiv Detail & Related papers (2023-07-17T04:32:45Z) - Automated Few-shot Classification with Instruction-Finetuned Language
Models [76.69064714392165]
We show that AuT-Few outperforms state-of-the-art few-shot learning methods.
We also show that AuT-Few is the best ranking method across datasets on the RAFT few-shot benchmark.
arXiv Detail & Related papers (2023-05-21T21:50:27Z) - A Benchmark Generative Probabilistic Model for Weak Supervised Learning [2.0257616108612373]
Weak Supervised Learning approaches have been developed to alleviate the annotation burden.
We show that latent variable models (PLVMs) achieve state-of-the-art performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:06:24Z) - AutoGeoLabel: Automated Label Generation for Geospatial Machine Learning [69.47585818994959]
We evaluate a big data processing pipeline to auto-generate labels for remote sensing data.
We utilize the big geo-data platform IBM PAIRS to dynamically generate such labels in dense urban areas.
arXiv Detail & Related papers (2022-01-31T20:02:22Z) - AstronomicAL: An interactive dashboard for visualisation, integration
and classification of data using Active Learning [0.0]
AstronomicAL is a human-in-the-loop interactive labelling and training dashboard.
It allows users to create reliable datasets and robust classifiers using active learning.
System allows users to visualise and integrate data from different sources.
arXiv Detail & Related papers (2021-09-11T07:32:26Z) - Generate, Annotate, and Learn: Generative Models Advance Self-Training
and Knowledge Distillation [58.64720318755764]
Semi-Supervised Learning (SSL) has seen success in many application domains, but this success often hinges on the availability of task-specific unlabeled data.
Knowledge distillation (KD) has enabled compressing deep networks and ensembles, achieving the best results when distilling knowledge on fresh task-specific unlabeled examples.
We present a general framework called "generate, annotate, and learn (GAL)" that uses unconditional generative models to synthesize in-domain unlabeled data.
arXiv Detail & Related papers (2021-06-11T05:01:24Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z) - Beyond without Forgetting: Multi-Task Learning for Classification with
Disjoint Datasets [27.570773346794613]
Multi-task Learning (MTL) for classification with disjoint datasets aims to explore MTL when one task only has one labeled dataset.
Inspired by semi-supervised learning, we use unlabeled datasets with pseudo labels to facilitate each task.
We propose our MTL with Selective Augmentation (MTL-SA) method to select the training samples in unlabeled datasets with confident pseudo labels and close data distribution to the labeled dataset.
arXiv Detail & Related papers (2020-03-15T03:19:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.