TransPOS: Transformers for Consolidating Different POS Tagset Datasets
- URL: http://arxiv.org/abs/2209.11959v1
- Date: Sat, 24 Sep 2022 08:43:53 GMT
- Title: TransPOS: Transformers for Consolidating Different POS Tagset Datasets
- Authors: Alex Li, Ilyas Bankole-Hameed, Ranadeep Singh, Gabriel Shen Han Ng,
Akshat Gupta
- Abstract summary: This paper considers two datasets that label part-of-speech (POS) tags under different tagging schemes.
It proposes a novel supervised architecture employing Transformers to tackle the problem of consolidating two completely disjoint datasets.
- Score: 0.8399688944263843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In hope of expanding training data, researchers often want to merge two or
more datasets that are created using different labeling schemes. This paper
considers two datasets that label part-of-speech (POS) tags under different
tagging schemes and leverage the supervised labels of one dataset to help
generate labels for the other dataset. This paper further discusses the
theoretical difficulties of this approach and proposes a novel supervised
architecture employing Transformers to tackle the problem of consolidating two
completely disjoint datasets. The results diverge from initial expectations and
discourage exploration into the use of disjoint labels to consolidate datasets
with different labels.
Related papers
- Exploiting Conjugate Label Information for Multi-Instance Partial-Label Learning [61.00359941983515]
Multi-instance partial-label learning (MIPL) addresses scenarios where each training sample is represented as a multi-instance bag associated with a candidate label set containing one true label and several false positives.
ELIMIPL exploits the conjugate label information to improve the disambiguation performance.
arXiv Detail & Related papers (2024-08-26T15:49:31Z) - Label Dependencies-aware Set Prediction Networks for Multi-label Text Classification [0.0]
We leverage Graph Convolutional Networks and construct an adjacency matrix based on the statistical relations between labels.
We enhance recall ability by applying the Bhattacharyya distance to the output distributions of the set prediction networks.
arXiv Detail & Related papers (2023-04-14T09:31:17Z) - Learning Semantic Segmentation from Multiple Datasets with Label Shifts [101.24334184653355]
This paper proposes UniSeg, an effective approach to automatically train models across multiple datasets with differing label spaces.
Specifically, we propose two losses that account for conflicting and co-occurring labels to achieve better generalization performance in unseen domains.
arXiv Detail & Related papers (2022-02-28T18:55:19Z) - Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision [56.950950382415925]
We propose a novel consistency regularization approach, called cross pseudo supervision (CPS)
The CPS consistency has two roles: encourage high similarity between the predictions of two perturbed networks for the same input image, and expand training data by using the unlabeled data with pseudo labels.
Experiment results show that our approach achieves the state-of-the-art semi-supervised segmentation performance on Cityscapes and PASCAL VOC 2012.
arXiv Detail & Related papers (2021-06-02T15:21:56Z) - Group-aware Label Transfer for Domain Adaptive Person Re-identification [179.816105255584]
Unsupervised Adaptive Domain (UDA) person re-identification (ReID) aims at adapting the model trained on a labeled source-domain dataset to a target-domain dataset without any further annotations.
Most successful UDA-ReID approaches combine clustering-based pseudo-label prediction with representation learning and perform the two steps in an alternating fashion.
We propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.
arXiv Detail & Related papers (2021-03-23T07:57:39Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - Object Detection with a Unified Label Space from Multiple Datasets [94.33205773893151]
Given multiple datasets with different label spaces, the goal of this work is to train a single object detector predicting over the union of all the label spaces.
Consider an object category like faces that is annotated in one dataset, but is not annotated in another dataset.
Some categories, like face here, would thus be considered foreground in one dataset, but background in another.
We propose loss functions that carefully integrate partial but correct annotations with complementary but noisy pseudo labels.
arXiv Detail & Related papers (2020-08-15T00:51:27Z) - openXDATA: A Tool for Multi-Target Data Generation and Missing Label
Completion [23.14045574165086]
A common problem in machine learning is to deal with datasets with disjoint label spaces and missing labels.
In this work, we introduce the openXdata tool that completes the missing labels in partially labelled or unlabelled datasets.
We show the ability to estimate both categories and continuous labels for all of the datasets, at rates that approached the ground truth values.
arXiv Detail & Related papers (2020-07-27T22:05:53Z) - Unsupervised Multi-label Dataset Generation from Web Data [2.267916014951237]
This paper presents a system towards the generation of multi-label datasets from web data in an unsupervised manner.
The generation of a single-label dataset uses an unsupervised noise reduction phase (clustering and selection of clusters using anchors) obtaining a 85% of correctly labeled images.
An unsupervised label augmentation process is then performed to assign new labels to the images in the dataset using the class activation maps and the uncertainty associated with each class.
arXiv Detail & Related papers (2020-05-12T08:57:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.