Temporal-aware Language Representation Learning From Crowdsourced Labels
- URL: http://arxiv.org/abs/2107.07958v1
- Date: Thu, 15 Jul 2021 05:25:56 GMT
- Title: Temporal-aware Language Representation Learning From Crowdsourced Labels
- Authors: Yang Hao, Xiao Zhai, Wenbiao Ding, Zitao Liu
- Abstract summary: We propose emphTACMA, a language representation learning algorithm for underlinecrowdsourced labels with underlineannotators.
The proposed is extremely easy to implement in around 5 lines of code.
The results show that our approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC.
- Score: 12.40460861125743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning effective language representations from crowdsourced labels is
crucial for many real-world machine learning tasks. A challenging aspect of
this problem is that the quality of crowdsourced labels suffer high intra- and
inter-observer variability. Since the high-capacity deep neural networks can
easily memorize all disagreements among crowdsourced labels, directly applying
existing supervised language representation learning algorithms may yield
suboptimal solutions. In this paper, we propose \emph{TACMA}, a
\underline{t}emporal-\underline{a}ware language representation learning
heuristic for \underline{c}rowdsourced labels with \underline{m}ultiple
\underline{a}nnotators. The proposed approach (1) explicitly models the
intra-observer variability with attention mechanism; (2) computes and
aggregates per-sample confidence scores from multiple workers to address the
inter-observer disagreements. The proposed heuristic is extremely easy to
implement in around 5 lines of code. The proposed heuristic is evaluated on
four synthetic and four real-world data sets. The results show that our
approach outperforms a wide range of state-of-the-art baselines in terms of
prediction accuracy and AUC. To encourage the reproducible results, we make our
code publicly available at \url{https://github.com/CrowdsourcingMining/TACMA}.
Related papers
- Text-Guided Mixup Towards Long-Tailed Image Categorization [7.207351201912651]
In many real-world applications, the frequency distribution of class labels for training data can exhibit a long-tailed distribution.
We propose a novel text-guided mixup technique that takes advantage of the semantic relations between classes recognized by the pre-trained text encoder.
arXiv Detail & Related papers (2024-09-05T14:37:43Z) - UniPSDA: Unsupervised Pseudo Semantic Data Augmentation for Zero-Shot Cross-Lingual Natural Language Understanding [31.272603877215733]
Cross-lingual representation learning transfers knowledge from resource-rich data to resource-scarce ones to improve the semantic understanding abilities of different languages.
We propose an Unsupervised Pseudo Semantic Data Augmentation (UniPSDA) mechanism for cross-lingual natural language understanding to enrich the training data without human interventions.
arXiv Detail & Related papers (2024-06-24T07:27:01Z) - CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding [91.97362831507434]
Unsupervised visual grounding has been developed to locate regions using pseudo-labels.
We propose CLIP-VG, a novel method that can conduct self-paced curriculum adapting of CLIP with pseudo-language labels.
Our method outperforms the current state-of-the-art unsupervised method by a significant margin on RefCOCO/+/g datasets.
arXiv Detail & Related papers (2023-05-15T14:42:02Z) - MaPLe: Multi-modal Prompt Learning [54.96069171726668]
We propose Multi-modal Prompt Learning (MaPLe) for both vision and language branches to improve alignment between the vision and language representations.
Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes.
arXiv Detail & Related papers (2022-10-06T17:59:56Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Investigating Power laws in Deep Representation Learning [4.996066540156903]
We propose a framework to evaluate the quality of representations in unlabelled datasets.
We estimate the coefficient of the power law, $alpha$, across three key attributes which influence representation learning.
Notably, $alpha$ is computable from the representations without knowledge of any labels, thereby offering a framework to evaluate the quality of representations in unlabelled datasets.
arXiv Detail & Related papers (2022-02-11T18:11:32Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - NeuCrowd: Neural Sampling Network for Representation Learning with
Crowdsourced Labels [19.345894148534335]
We propose emphNeuCrowd, a unified framework for supervised representation learning (SRL) from crowdsourced labels.
The proposed framework is evaluated on both one synthetic and three real-world data sets.
arXiv Detail & Related papers (2020-03-21T13:38:18Z) - Distant Supervision and Noisy Label Learning for Low Resource Named
Entity Recognition: A Study on Hausa and Yor\`ub\'a [23.68953940000046]
Techniques such as distant and weak supervision can be used to create labeled data in a (semi-) automatic way.
We evaluate different embedding approaches and show that distant supervision can be successfully leveraged in a realistic low-resource scenario.
arXiv Detail & Related papers (2020-03-18T17:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.