Constraining Linear-chain CRFs to Regular Languages
- URL: http://arxiv.org/abs/2106.07306v6
- Date: Fri, 11 Aug 2023 10:46:29 GMT
- Title: Constraining Linear-chain CRFs to Regular Languages
- Authors: Sean Papay, Roman Klinger and Sebastian Pad\'o
- Abstract summary: A major challenge in structured prediction is to represent the interdependencies within output structures.
We present a generalization of CRFs that can enforce a broad class of constraints, including nonlocal ones.
We prove that constrained training is never worse than constrained decoding, and show empirically that it can be substantially better in practice.
- Score: 10.759863489447204
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A major challenge in structured prediction is to represent the
interdependencies within output structures. When outputs are structured as
sequences, linear-chain conditional random fields (CRFs) are a widely used
model class which can learn \textit{local} dependencies in the output. However,
the CRF's Markov assumption makes it impossible for CRFs to represent
distributions with \textit{nonlocal} dependencies, and standard CRFs are unable
to respect nonlocal constraints of the data (such as global arity constraints
on output labels). We present a generalization of CRFs that can enforce a broad
class of constraints, including nonlocal ones, by specifying the space of
possible output structures as a regular language $\mathcal{L}$. The resulting
regular-constrained CRF (RegCCRF) has the same formal properties as a standard
CRF, but assigns zero probability to all label sequences not in $\mathcal{L}$.
Notably, RegCCRFs can incorporate their constraints during training, while
related models only enforce constraints during decoding. We prove that
constrained training is never worse than constrained decoding, and show
empirically that it can be substantially better in practice. Additionally, we
demonstrate a practical benefit on downstream tasks by incorporating a RegCCRF
into a deep neural model for semantic role labeling, exceeding state-of-the-art
results on a standard dataset.
Related papers
- Optimal Kernel Quantile Learning with Random Features [0.9208007322096533]
This paper presents a generalization study of kernel quantile regression with random features (KQR-RF)
Our study establishes the capacity-dependent learning rates for KQR-RF under mild conditions on the number of RFs.
By slightly modifying our assumptions, the capacity-dependent error analysis can also be applied to cases with Lipschitz continuous losses.
arXiv Detail & Related papers (2024-08-24T14:26:09Z) - Confident Sinkhorn Allocation for Pseudo-Labeling [40.883130133661304]
Semi-supervised learning is a critical tool in reducing machine learning's dependence on labeled data.
This paper studies theoretically the role of uncertainty to pseudo-labeling and proposes Confident Sinkhorn Allocation (CSA)
CSA identifies the best pseudo-label allocation via optimal transport to only samples with high confidence scores.
arXiv Detail & Related papers (2022-06-13T02:16:26Z) - Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels.
We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z) - Feature Completion for Occluded Person Re-Identification [138.5671859358049]
RFC block can recover semantics of occluded regions in feature space.
SRFC exploits the long-range spatial contexts from non-occluded regions to predict the features of occluded regions.
TRFC module captures the long-term temporal contexts to refine the prediction of SRFC.
arXiv Detail & Related papers (2021-06-24T02:40:40Z) - Latent Template Induction with Gumbel-CRFs [107.17408593510372]
We explore the use of structured variational autoencoders to infer latent templates for sentence generation.
As a structured inference network, we show that it learns interpretable templates during training.
arXiv Detail & Related papers (2020-11-29T01:00:57Z) - Neural Latent Dependency Model for Sequence Labeling [47.32215014130811]
A classic approach to sequence labeling is linear chain conditional random fields (CRFs)
One limitation of linear chain CRFs is their inability to model long-range dependencies between labels.
High order CRFs extend linear chain CRFs by no longer than their order, but the computational complexity grows exponentially in the order.
We propose a Neural Latent Dependency Model (NLDM) that models arbitrary length between labels with a latent tree structure.
arXiv Detail & Related papers (2020-11-10T10:05:21Z) - Constrained Decoding for Computationally Efficient Named Entity
Recognition Taggers [15.279850826041066]
Current work eschews prior knowledge of how the span encoding scheme works and relies on the conditional random field (CRF) learning which transitions are illegal and which are not to facilitate global coherence.
We find that by constraining the output to suppress illegal transitions we can train a tagger with a cross-entropy loss twice as fast as a CRF with differences in F1 that are statistically insignificant.
arXiv Detail & Related papers (2020-10-09T04:07:52Z) - Random Forests for dependent data [1.5469452301122173]
We propose RF-GLS, a novel extension of RF for dependent error processes.
The key to this extension is the equivalent representation of the local decision-making in a regression tree as a global OLS optimization.
We empirically demonstrate the improvement achieved by RF-GLS over RF for both estimation and prediction under dependence.
arXiv Detail & Related papers (2020-07-30T12:36:09Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Learning Likelihoods with Conditional Normalizing Flows [54.60456010771409]
Conditional normalizing flows (CNFs) are efficient in sampling and inference.
We present a study of CNFs where the base density to output space mapping is conditioned on an input x, to model conditional densities p(y|x)
arXiv Detail & Related papers (2019-11-29T19:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.