BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised
Named Entity Recognition
- URL: http://arxiv.org/abs/2105.12848v1
- Date: Wed, 26 May 2021 21:18:48 GMT
- Title: BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised
Named Entity Recognition
- Authors: Yinghao Li, Pranav Shetty, Lucas Liu, Chao Zhang, Le Song
- Abstract summary: conditional hidden Markov model (CHMM)
CHMM predicts token-wise transition and emission probabilities from the BERT embeddings of the input tokens.
It fine-tunes a BERT-based NER model with the labels inferred by CHMM.
- Score: 57.2201011783393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of learning a named entity recognition (NER) model using
noisy la-bels from multiple weak supervision sources. Though cheaper than human
annotators, weak sources usually yield incomplete, inaccurate, or contradictory
predictions. To address such challenges, we propose a conditional hidden Markov
model (CHMM). It inherits the hidden Markov model's ability to aggregating the
labels from weak sources through unsupervised learning. However, CHMM enhances
the hidden Markov model's flexibility and context representation capability by
predicting token-wise transition and emission probabilities from the BERT
embeddings of the input tokens. In addition, we refine CHMM's prediction with
an alternate-training approach (CHMM-AlT). It fine-tunes a BERT-based NER model
with the labels inferred by CHMM, and this BERT-NER's output is regarded as an
additional weak source to train the CHMM in return. Evaluation on four datasets
from various domains shows that our method is superior to the weakly
super-vised baselines by a wide margin.
Related papers
- Unsupervised Learning of Harmonic Analysis Based on Neural HSMM with
Code Quality Templates [0.3233195475347961]
This paper presents a method of unsupervised learning of harmonic analysis based on a hidden semi-Markov model.
We show how to recognize the tonic without prior knowledge, based on the transition probabilities of the Markov model.
arXiv Detail & Related papers (2024-03-07T01:29:48Z) - From Self-Attention to Markov Models: Unveiling the Dynamics of
Generative Transformers [41.82477691012942]
We study learning a 1-layer self-attention model from a set of prompts and associated output data.
We first establish a precise mapping between the self-attention mechanism and Markov models.
We characterize an intriguing winner-takes-all phenomenon where the generative process implemented by self-attention collapses into sampling a limited subset of tokens.
arXiv Detail & Related papers (2024-02-21T03:51:34Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Sparse Conditional Hidden Markov Model for Weakly Supervised Named
Entity Recognition [68.68300358332156]
We propose the sparse conditional hidden Markov model (Sparse-CHMM) to evaluate noisy labeling functions.
Sparse-CHMM is optimized through unsupervised learning with a three-stage training pipeline.
It achieves a 3.01 average F1 score improvement on five comprehensive datasets.
arXiv Detail & Related papers (2022-05-27T20:47:30Z) - Generative Modeling Helps Weak Supervision (and Vice Versa) [87.62271390571837]
We propose a model fusing weak supervision and generative adversarial networks.
It captures discrete variables in the data alongside the weak supervision derived label estimate.
It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels.
arXiv Detail & Related papers (2022-03-22T20:24:21Z) - Robust Classification using Hidden Markov Models and Mixtures of
Normalizing Flows [25.543231171094384]
We use a generative model that combines the state transitions of a hidden Markov model (HMM) and the neural network based probability distributions for the hidden states of the HMM.
We verify the improved robustness of NMM-HMM classifiers in an application to speech recognition.
arXiv Detail & Related papers (2021-02-15T00:40:30Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Reliable Time Prediction in the Markov Stochastic Block Model [0.0]
We show how MSBMs can be used to detect dependence structure in growing graphs.
We provide methods to solve the so-called link prediction and collaborative filtering problems.
arXiv Detail & Related papers (2020-04-09T07:58:02Z) - Semi-supervised Learning Meets Factorization: Learning to Recommend with
Chain Graph Model [16.007141894770054]
latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability.
Semi-supervised learning (SSL) provides an effective way to alleviate the label (i.e., rating) sparsity problem.
We propose a novel probabilistic chain graph model (CGM) to marry SSL with LFM.
arXiv Detail & Related papers (2020-03-05T06:34:53Z) - AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering.
The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch.
The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level.
The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.