Related papers: BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition

BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition

URL: http://arxiv.org/abs/2105.12848v1
Date: Wed, 26 May 2021 21:18:48 GMT
Title: BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition
Authors: Yinghao Li, Pranav Shetty, Lucas Liu, Chao Zhang, Le Song
Abstract summary: conditional hidden Markov model (CHMM) CHMM predicts token-wise transition and emission probabilities from the BERT embeddings of the input tokens. It fine-tunes a BERT-based NER model with the labels inferred by CHMM.
Score: 57.2201011783393
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the problem of learning a named entity recognition (NER) model using noisy la-bels from multiple weak supervision sources. Though cheaper than human annotators, weak sources usually yield incomplete, inaccurate, or contradictory predictions. To address such challenges, we propose a conditional hidden Markov model (CHMM). It inherits the hidden Markov model's ability to aggregating the labels from weak sources through unsupervised learning. However, CHMM enhances the hidden Markov model's flexibility and context representation capability by predicting token-wise transition and emission probabilities from the BERT embeddings of the input tokens. In addition, we refine CHMM's prediction with an alternate-training approach (CHMM-AlT). It fine-tunes a BERT-based NER model with the labels inferred by CHMM, and this BERT-NER's output is regarded as an additional weak source to train the CHMM in return. Evaluation on four datasets from various domains shows that our method is superior to the weakly super-vised baselines by a wide margin.

Related papers

IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation [70.2753541780788]
We introduce an Information Gain-based Decisiveness-aware Token handling (IGD) strategy that integrates token decisiveness into both tuning and decoding.<n>IGD consistently improves recommendation accuracy, achieving significant gains on widely used ranking metrics compared to strong baselines.
arXiv Detail & Related papers (2025-06-16T08:28:19Z)
Text Generation Beyond Discrete Token Sampling [75.96920867382859]
Mixture of Inputs (MoI) is a training-free method for autoregressive generation.<n>MoI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B.
arXiv Detail & Related papers (2025-05-20T18:41:46Z)
Unsupervised Learning of Harmonic Analysis Based on Neural HSMM with Code Quality Templates [0.3233195475347961]
This paper presents a method of unsupervised learning of harmonic analysis based on a hidden semi-Markov model. We show how to recognize the tonic without prior knowledge, based on the transition probabilities of the Markov model.
arXiv Detail & Related papers (2024-03-07T01:29:48Z)
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers [41.82477691012942]
We study learning a 1-layer self-attention model from a set of prompts and associated output data. We first establish a precise mapping between the self-attention mechanism and Markov models. We characterize an intriguing winner-takes-all phenomenon where the generative process implemented by self-attention collapses into sampling a limited subset of tokens.
arXiv Detail & Related papers (2024-02-21T03:51:34Z)
SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation. We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z)
Sparse Conditional Hidden Markov Model for Weakly Supervised Named Entity Recognition [68.68300358332156]
We propose the sparse conditional hidden Markov model (Sparse-CHMM) to evaluate noisy labeling functions. Sparse-CHMM is optimized through unsupervised learning with a three-stage training pipeline. It achieves a 3.01 average F1 score improvement on five comprehensive datasets.
arXiv Detail & Related papers (2022-05-27T20:47:30Z)
Generative Modeling Helps Weak Supervision (and Vice Versa) [87.62271390571837]
We propose a model fusing weak supervision and generative adversarial networks. It captures discrete variables in the data alongside the weak supervision derived label estimate. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels.
arXiv Detail & Related papers (2022-03-22T20:24:21Z)
Robust Classification using Hidden Markov Models and Mixtures of Normalizing Flows [25.543231171094384]
We use a generative model that combines the state transitions of a hidden Markov model (HMM) and the neural network based probability distributions for the hidden states of the HMM. We verify the improved robustness of NMM-HMM classifiers in an application to speech recognition.
arXiv Detail & Related papers (2021-02-15T00:40:30Z)
Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models. We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs. Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
Reliable Time Prediction in the Markov Stochastic Block Model [0.0]
We show how MSBMs can be used to detect dependence structure in growing graphs. We provide methods to solve the so-called link prediction and collaborative filtering problems.
arXiv Detail & Related papers (2020-04-09T07:58:02Z)
Semi-supervised Learning Meets Factorization: Learning to Recommend with Chain Graph Model [16.007141894770054]
latent factor model (LFM) has been drawing much attention in recommender systems due to its good performance and scalability. Semi-supervised learning (SSL) provides an effective way to alleviate the label (i.e., rating) sparsity problem. We propose a novel probabilistic chain graph model (CGM) to marry SSL with LFM.
arXiv Detail & Related papers (2020-03-05T06:34:53Z)
AvgOut: A Simple Output-Probability Measure to Eliminate Dull Responses [97.50616524350123]
We build dialogue models that are dynamically aware of what utterances or tokens are dull without any feature-engineering. The first model, MinAvgOut, directly maximizes the diversity score through the output distributions of each batch. The second model, Label Fine-Tuning (LFT), prepends to the source sequence a label continuously scaled by the diversity score to control the diversity level. The third model, RL, adopts Reinforcement Learning and treats the diversity score as a reward signal.
arXiv Detail & Related papers (2020-01-15T18:32:06Z)
Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning [66.18202188565922]
We propose a communication-efficient decentralized machine learning (ML) algorithm, coined QGADMM (QGADMM) We develop a novel quantization method to adaptively adjust modelization levels and their probabilities, while proving the convergence of QGADMM for convex functions.
arXiv Detail & Related papers (2019-10-23T10:47:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.