Related papers: Neural Network Architecture for Database Augmentation Using Shared Features

Neural Network Architecture for Database Augmentation Using Shared Features

URL: http://arxiv.org/abs/2302.01374v1
Date: Thu, 2 Feb 2023 19:17:06 GMT
Title: Neural Network Architecture for Database Augmentation Using Shared Features
Authors: William C. Sleeman IV, Rishabh Kapoor, Preetam Ghosh
Abstract summary: Inherent challenges in some domains such as medicine make it difficult to create large single source datasets or multi-source datasets with identical features. We propose a neural network architecture that can provide data augmentation using features common between these datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The popularity of learning from data with machine learning and neural networks has lead to the creation of many new datasets for almost every problem domain. However, even within a single domain, these datasets are often collected with disparate features, sampled from different sub-populations, and recorded at different time points. Even with the plethora of individual datasets, large data science projects can be difficult as it is often not trivial to merge these smaller datasets. Inherent challenges in some domains such as medicine also makes it very difficult to create large single source datasets or multi-source datasets with identical features. Instead of trying to merge these non-matching datasets directly, we propose a neural network architecture that can provide data augmentation using features common between these datasets. Our results show that this style of data augmentation can work for both image and tabular data.

Related papers

The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning [3.649801602551928]
We develop a set of neuroscience-inspired self-supervised objectives, together with a neural architecture, for representation learning from heterogeneous recordings. Results show that representations learned with these objectives scale with data, generalise across subjects, datasets, and tasks, and surpass comparable self-supervised approaches.
arXiv Detail & Related papers (2024-06-06T17:59:09Z)
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction [93.77809355002591]
We introduce UniTraj, a comprehensive framework that unifies various datasets, models, and evaluation criteria. We conduct extensive experiments and find that model performance significantly drops when transferred to other datasets. We provide insights into dataset characteristics to explain these findings.
arXiv Detail & Related papers (2024-03-22T10:36:50Z)
Combining datasets to increase the number of samples and improve model fitting [7.4771091238795595]
We propose a novel framework called Combine datasets based on Imputation (ComImp) In addition, we propose a variant of ComImp that uses Principle Component Analysis (PCA), PCA-ComImp in order to reduce dimension before combining datasets. Our results indicate that the proposed methods are somewhat similar to transfer learning in that the merge can significantly improve the accuracy of a prediction model on smaller datasets.
arXiv Detail & Related papers (2022-10-11T06:06:37Z)
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding [137.3719377780593]
A new design (named Detection Hub) is dataset-aware and category-aligned. It mitigates the dataset inconsistency and provides coherent guidance for the detector to learn across multiple datasets. The categories across datasets are semantically aligned into a unified space by replacing one-hot category representations with word embedding.
arXiv Detail & Related papers (2022-06-07T17:59:44Z)
Parsing with Pretrained Language Models, Multiple Datasets, and Dataset Embeddings [13.097523786733872]
We compare two methods to embed datasets in a transformer-based multilingual dependency. We confirm that performance increases are highest for small datasets and datasets with a low baseline score. We show that training on the combination of all datasets performs similarly to designing smaller clusters based on language-relatedness.
arXiv Detail & Related papers (2021-12-07T10:47:07Z)
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets [122.85598648289789]
We study how multi-domain and multi-task datasets can improve the learning of new tasks in new environments. We also find that data for only a few tasks in a new domain can bridge the domain gap and make it possible for a robot to perform a variety of prior tasks that were only seen in other domains.
arXiv Detail & Related papers (2021-09-27T23:42:12Z)
Neural Network Training with Highly Incomplete Datasets [1.5658704610960568]
GapNet is an alternative deep-learning training approach that can use highly incomplete datasets. We show that GapNet improves the identification of patients with underlying Alzheimer's disease pathology and of patients at risk of hospitalization due to Covid-19.
arXiv Detail & Related papers (2021-07-01T13:21:45Z)
DAIL: Dataset-Aware and Invariant Learning for Face Recognition [67.4903809903022]
To achieve good performance in face recognition, a large scale training dataset is usually required. It is problematic and troublesome to naively combine different datasets due to two major issues. Naively treating the same person as different classes in different datasets during training will affect back-propagation. manually cleaning labels may take formidable human efforts, especially when there are millions of images and thousands of identities.
arXiv Detail & Related papers (2021-01-14T01:59:52Z)
Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data [78.74367441804183]
We introduce Neural Data Server (NDS), a large-scale search engine for finding the most useful transfer learning data to the target domain. NDS consists of a dataserver which indexes several large popular image datasets, and aims to recommend data to a client. We show the effectiveness of NDS in various transfer learning scenarios, demonstrating state-of-the-art performance on several target datasets.
arXiv Detail & Related papers (2020-01-09T01:21:30Z)
DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier [58.979104709647295]
We bridge the gap between the abundance of available data and lack of relevant data, for the future learning tasks of a trained network. We use the available data, that may be an imbalanced subset of the original training dataset, or a related domain dataset, to retrieve representative samples. We demonstrate that data from a related domain can be leveraged to achieve state-of-the-art performance.
arXiv Detail & Related papers (2019-12-27T02:05:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.